What is a bloom filter?
A Bloom filter is a probabilistic data structure designed to support membership queries on a set in a space-efficient manner allowing false positives but no false negatives.
It can indicate if an element is definitely not in a set, or probably in a set. Bloom filters have very compact representations compared to storing the full set explicitly.
Bloom filters rely on hash functions and interval arithmetic to map set elements to array bits efficiently. They are used in systems like databases, networks and browsers for scalable set representation. Related structures like B-trees, distributed hash tables, and skip lists have comparable tradeoffs.
How does it work?
A Bloom filter consists of a bit vector initialized to 0s and uses multiple hash functions to map elements to random index positions. To add an item, the bits at the hashed indexes are flipped to 1. To check membership, you check if the bits are set at all the hashed positions.
Why is it important? Where is it used?
Bloom filters enable space-efficient representation of sets and fast membership checks with adjustable error tradeoffs. Uses include network routing, databases, caching layers, data streams and anywhere space matters.
The compact probabilistic representation enables scaling massive sets with small memory overhead. The tolerance for false positives provides the efficiency gains.
FAQ
What are the key properties of a Bloom filter?
What are the disadvantages of Bloom filters?
What parameters can be tuned in a Bloom filter?
Key parameters:
What are common applications of Bloom filters?
References
Related Topics
Count Min Sketch
A Count Min Sketch is a probabilistic data structure used to estimate item frequencies and counts in data streams.
Distributed Hash Table
A distributed hash table (DHT) is a decentralized distributed system that partitions a key space across nodes and uses hash functions to assign ownership and locate data.
Skip List
A skip list is a probabilistic data structure that provides fast search and insertion over an ordered sequence using hierarchy of linked lists to skip over elements.
B-tree
A B-tree is a tree data structure optimized for fast indexed key lookups and writes on disk storage while keeping the tree balanced.