What is lambda architecture?
Lambda architecture is a big data systems design pattern that combines both batch and real-time stream processing to meet the latency and throughput needs of large-scale data processing pipelines.
It serves requests using a read-optimized view generated by batch processing while using fast stream processing on recent data to fill the batch latency gap. The batch and stream results are stitched together at query time.
Lambda architecture emerged as a popular pattern but adds complexity of managing separate pipelines. The Kappa architecture and unified processing approaches simplify this by collapsing the batch and streaming into a single pipeline.
What does it do? How does it work?
In a lambda architecture, incoming data is dispatched to both a batch processing system that stores the master dataset, and a lower-latency stream processing system that handles recent data.
Queries are resolved by merging results from both - taking the pre-computed batch results for historical data, and fast real-time stream results for recent data.
Why is it important? Where is it used?
Lambda architecture gained popularity as a pattern for handling large data volumes with high throughput batch processing and low latency real-time analytics.
It can be useful for workloads that require a combination of real-time data queries and complex analytics on the full historical dataset. However, complexity and operational costs have led to alternative unified approaches.
FAQ
What are the components of a lambda architecture?
The core components of a lambda architecture are:
- Batch processing layer for master dataset
- Stream processing layer for recent real-time data
- Serving layer to respond to queries
- Coordination glue code to orchestrate and reconcile batch and stream processing
What are the pros and cons of lambda architecture?
Pros:
- Tuned for latency and throughput
- Robust and fault-tolerant
Cons:
- Complexity of operating disparate systems
- Reconciling batch and stream results
- Repeated reprocessing of data
How is it different from Kappa architecture?
The Kappa architecture aims to simplify lambda architecture by eliminating the batch processing component and relying solely on stream processing for both real-time and historical processing.
When is lambda architecture a good choice?
Lambda architecture excels for workloads requiring:
- Very low latency reads on recent data
- High throughput storage and queries on the full master dataset
- Complex analytical batch processing jobs
References:
- [Book] Deciphering Data Architectures by O'Reilly Media, Inc.
- [Post] Questioning the Lambda Architecture
- [Post] General-purpose Stream Joins via Pruning Symmetric Hash Joins
- [Post] Lambda and Kappa Architectures in Brief
- [Post] Kappa Architecture is Mainstream Replacing Lambda
- [Post] The Lambda Architecture, simplified