What is unified processing?
Unified processing is an approach to designing big data architectures where the same engine handles both batch and streaming workloads, rather than having separate batch and stream processing systems.
This simplifies architecture by avoiding the need to merge or coordinate across distinct platforms. Workloads can take advantage of both batch and stream processing capabilities in a unified way.
Unified processing is a key enabler of the Kappa architecture pattern. Modern stream processors have adopted unified processing capabilities to replace specialized batch processing engines used in Lambda architecture.
What does unified processing do? How does it work?
In a unified system, a single engine like Flink or Spark processes data using the same execution engine, APIs, and storage layer whether it is bounded batch data or unbounded streams.
The engine provides common abstractions to handle both use cases. This enables easy switching between stream and batch views of data.
Why is it important? Where is it used?
Unified processing reduces complexity compared to hybrid approaches. It improves developer productivity and makes operational management easier.
Use cases include web and mobile analytics, data pipelines, IoT, fraud detection and other applications requiring flexibility between batch and real-time processing. Unified solutions can replace lambda/kappa architectures.
FAQ
How does unified processing contrast with lambda architecture?
Unified processing eliminates the need to combine separate systems. There is just one way to process and query all data.
What are some key unified processing technologies?
Examples include Apache Flink, Apache Spark Structured Streaming, Amazon Kinesis Data Analytics, Google Cloud Dataflow and Azure Stream Analytics.
What are key benefits of unified processing?
Benefits include simplified development, no context switching between different semantics, reduced operational complexity and easier ways to process data as streams or batches.
What are potential downsides of unified processing?
Immaturity compared to specialized engines, possible performance or cost impacts of generality, limitations in very high scale stream processing scenarios.
When is unified processing appropriate?
Unified approaches excel for use cases like:
References:
Related Topics
Kappa Architecture
Kappa architecture is a big data processing pattern that uses stream processing for both real-time and historical analytics, avoiding the complexity of hybrid stream and batch processing.
Batch Processing
Batch processing is the execution of a series of programs or jobs on a set of data in batches without user interaction for efficiently processing high volumes of data.
Unified Processing
Unified processing refers to data pipeline architectures that handle batch and real-time processing using a single processing engine, avoiding the complexities of hybrid systems.
Lambda Architecture
Lambda architecture is a big data processing pattern which combines both batch and real-time stream processing to get the benefits of high throughput and low latency querying.
Apache Arrow DataFusion
Apache DataFusion is an extensible, high-performance data processing framework in Rust, designed to efficiently execute analytical queries on large datasets. It utilizes the Apache Arrow in-memory data format.
Apache Arrow
Apache Arrow is a cross-language development platform for in-memory data, specifying a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations.