Bloom Filter

Algorithms/Data Structures
Updated on:
May 12, 2024

What is a bloom filter?

A Bloom filter is a probabilistic data structure designed to support membership queries on a set in a space-efficient manner allowing false positives but no false negatives.

It can indicate if an element is definitely not in a set, or probably in a set. Bloom filters have very compact representations compared to storing the full set explicitly.

Bloom filters rely on hash functions and interval arithmetic to map set elements to array bits efficiently. They are used in systems like databases, networks and browsers for scalable set representation. Related structures like B-treesdistributed hash tables, and skip lists have comparable tradeoffs.

How does it work?

A Bloom filter consists of a bit vector initialized to 0s and uses multiple hash functions to map elements to random index positions. To add an item, the bits at the hashed indexes are flipped to 1. To check membership, you check if the bits are set at all the hashed positions.

Why is it important? Where is it used?

Bloom filters enable space-efficient representation of sets and fast membership checks with adjustable error tradeoffs. Uses include network routing, databases, caching layers, data streams and anywhere space matters.

The compact probabilistic representation enables scaling massive sets with small memory overhead. The tolerance for false positives provides the efficiency gains.

FAQ

What are the key properties of a Bloom filter?

  • Space efficiency - Uses compact bit vectors
  • O(1) add and lookup time complexity
  • Zero false negatives; adjustable false positive rate
  • No need to store full set explicitly

What are the disadvantages of Bloom filters?

  • False positives are possible
  • No exact count of members
  • Rebuilding filter needed if full set changes
  • Cryptographic applications require more advanced constructions

What parameters can be tuned in a Bloom filter?

Key parameters:

  • Vector size
  • Hash functions used
  • Number of hash functions

What are common applications of Bloom filters?

  • Databases: Join filtering, deduplication
  • Networking: Routers, CDNs
  • Security: spellchecking, attack detection
  • Genome assembly
  • Distributed caches

References:


Related Entries

Count Min Sketch

A Count Min Sketch is a probabilistic data structure used to estimate item frequencies and counts in data streams.

Read more ->
Distributed Hash Table

A distributed hash table (DHT) is a decentralized distributed system that partitions a key space across nodes and uses hash functions to assign ownership and locate data.

Read more ->
Skip List

A skip list is a probabilistic data structure that provides fast search and insertion over an ordered sequence using hierarchy of linked lists to skip over elements.

Read more ->

Get early access to AI-native data infrastructure