The Apache Software Foundation (ASF) has announced that Apache DataFusion is now a Top-Level Project (TLP). This recognition underscores DataFusion’s maturity and its vital role in modern data processing.
DataFusion is an extensible query execution framework written in Rust, aimed at providing high-performance, in-memory data processing. It supports SQL queries and integrates seamlessly with Apache Arrow, facilitating efficient in-memory operations. The project has rapidly gained traction, with over 5.7k stars on GitHub, numerous contributors, and a vibrant user community as of 6 Aug 2024.
While specific metrics can vary based on the use case and environment, some general performance metrics and capabilities of DataFusion include:
Apache DataFusion's versatility and high performance have made it a popular choice for numerous projects across various domains. Here are some key projects and emerging initiatives leveraging DataFusion:
DataFusion's capabilities extend beyond these primary projects, supporting a wide range of other applications:
Notable examples include Cube Store for scalable storage and Spice.ai for SQL interfaces. Other projects include Dask SQL, a distributed SQL engine in Python; delta-rs, the Rust implementation of Delta Lake; and Exon, a life-science analysis toolkit. Further applications include CnosDB, an open-source time-series database; GlareDB for distributed SQL queries; and GreptimeDB, a cloud-native time-series database. Additional tools include HoraeDB, Kamu, LakeSoul, Lance, ParadeDB, Parseable, qv, Restate, ROAPI, Seafowl, VegaFusion, and ZincObserve. Longstanding users like Space and Time, and SDF Labs showcase the diverse applications of DataFusion.
For those interested in learning more about Apache DataFusion, numerous resources are available to help you get started and join the community. The Apache DataFusion GitHub page offers comprehensive documentation, source code, and examples to explore. Additionally, the ASF project page provides an overview of the project's goals, features, and latest updates. Engaging with the community is highly encouraged; you can participate in discussions, contribute to the project, and stay informed about upcoming events and developments by joining the DataFusion mailing list and following the project on social media.