Blog
Insights and updates from the Synnada team
Company
Recap of the Data & Drinks Meetup Featuring Apache DataFusion: Amsterdam 2025
The first Data & Drinks event of 2025 took place in Amsterdam on January 23, hosted by Xomnia at their HQ. This edition focused on Apache DataFusion, drawing a highly technical audience eager to explore its real-world applications and inner workings.
Recap of the First Apache DataFusion Meetup in Europe: Belgrade 2024
On September 27, 2024, the first Apache DataFusion Meetup in Europe took place in Belgrade, bringing together nearly 70 attendees. The event was held at the Microsoft office, where speakers showcased their work and shared insights on how they are utilizing DataFusion in various projects.
Apache DataFusion is a Top Level Project!
Apache DataFusion has been elevated to a Top-Level Project by the Apache Software Foundation, underscoring its maturity and essential role in data processing. This recognition reflects DataFusion's rapid growth, robust performance, and active community engagement.
Apache Arrow, Arrow/DataFusion, AI-native Data Infra — An Interview with Our CEO Ozan
Our CEO Ozan recently joined an episode of the Streaming Caffeine podcast — Streaming Caffeine E10: Ozan from Synnada, about Arrow Datafusion, Rust, Databases, SQL, AI — to discuss our perspective on DataFusion and the future of data infrastructure.
Modern Data Stack and the Data Chasm Part 2: A Path to Leaner Data Systems
This post explores how pioneering teams at Airbnb, Uber, and Apache Arrow overcame the data chasm, followed by an introduction to the Lean Data Stack paradigm as a way to build durable, economical, and flexible data systems.
Modern Data Stack and the Data Chasm Part 1: Emergence of Complexity in Data Systems
The data ecosystem is rapidly expanding and fragmenting, posing integration challenges industry-wide. Many companies fall into a "data chasm", needing to abruptly scale their tools from 2-4 to 15-20, exacerbating complexity. Some organizations pioneered methodologies to cross this chasm and extract value. How can others navigate this data chasm?
AI / ML: The Race for Specialized Electricity Supremacy
This blog post explores the AI/ML landscape, comparing it to a gold rush where the focus is on providing "specialized electricity" in the form of computing, storage, and networking resources.
Next Frontier: Action-Capable Intelligent Agents
The world of AI and data is undergoing a rapid transformation. Enabling technologies are maturing to a level where we should be able to deploy action-capable, autonomous intelligent agents at scale. But what will it take to make this a reality?
Engineering
Running Windowing Queries in Stream Processing
Windowing queries in stream processing play a pivotal role in handling time-series data. This post unravels how to harness streaming-friendly window functions in queries with just using ANSI-SQL, emphasizing the importance of ordering for achieving optimal results in streaming datasets.
Sliding Window Hash Join: Efficiently Joining Infinite Streams with Order Preservation
The Sliding Window Hash Join (SWHJ) algorithm joins potentially infinite streams while preserving the order by building hash tables incrementally, storing only relevant rows from the build side that fall within a sliding window, allowing efficient processing of streams without materializing all data.
Probabilistic Data Structures in Streaming: Count-Min Sketch
The Count-Min Sketch uses hash functions to map streamed items into a 2D counter array. When processing the stream, items are hashed to incremented counters, frequencies are est. by taking the min count across rows for an item's hashes.
General-purpose Stream Joins via Pruning Symmetric Hash Joins
Sliding window join for stream processing brings Datafusion a step closer to unified data processing. Find out how to efficiently join the streams with less memory usage and how to intelligently buffer both join sides.