Online analytical processing (OLAP) is a category of technologies, tools and techniques that allow analysts to query aggregated business data in multidimensional views to derive insights. OLAP enables complex analytical calculations on historical data for reporting and business intelligence needs.
OLAP systems and processes are optimized for analysis of consolidated enterprise data rather than transactional operations. OLAP tools facilitate activities like data mining, forecasting, what-if scenarios and multi-dimensional reporting.
To support multidimensional analytics at scale, OLAP architectures often employ techniques like incremental processing, distributed tracing, and cardinality estimation to efficiently query aggregated data across large datasets. Columnar storage, cubes, and materialized views are also common. The key focus is analytical latency, flexibility and insight rather than transaction throughput.
Modern query engines like Apache Arrow DataFusion provide high performance OLAP querying.
OLAP leverages data warehouses and multidimensional databases to store aggregated, historical data. Analysts query this multidimensional data cube using specialized languages like MDX. Frontend BI tools visualize results.
OLAP queries manipulate dimensional data on-the-fly, filtering, pivoting, grouping, aggregating etc. Cubes can be reprocessed periodically with ETL.
OLAP supports strategic decision making through interactive querying of vast business data. It powers major business intelligence use cases across sales, marketing, operations, finance, manufacturing, retail, healthcare, education and more.
Modern OLAP happens on everything from relational databases to MPP analytic cloud data warehouses like Snowflake, BigQuery, Redshift etc.
OLTP (online transaction processing) handles current operational transactions while OLAP analyzes historical aggregated data for business insights.
Common architectures include ROLAP on relational databases, MOLAP using multidimensional arrays, HOLAP which is hybrid disk and in-memory, and SOLAP for spatial data.
Key components are:
Incremental processing involves continuously processing and updating results as new data arrives, avoiding having to recompute results from scratch each time.
Read more ->Distributed tracing is a method used to profile and monitor complex distributed systems by instrumenting apps to log timing data across components, letting operators analyze bottlenecks and failures.
Read more ->Apache DataFusion is an extensible, high-performance data processing framework in Rust, designed to efficiently execute analytical queries on large datasets. It utilizes the Apache Arrow in-memory data format.
Read more ->