Skip to main content

One post tagged with "data-structures"

View All Tags

Probabilistic data structures: Why Bloom filters are one of the most elegant hack

· 14 min read
Pranjal Kumar
Software Engineer, @Bentley Systems

Bloom filter bit array visualization showing how hash functions map elements to bit positions for probabilistic set membership testing | Pranjal Kumar

Here's a question that sounds impossible: can you check if an element is in a set of 1 billion items using only 1.2 GB of RAM, with lookups completing in constant time, and never producing a false negative? Oh, and the data structure is a single contiguous array of bits; no pointers, no allocations, no garbage collection pauses.

The answer is yes. It's called a Bloom filter, and once you understand it, you'll see it everywhere; from the database engine under your app to Chrome's malicious URL checker to the network switches routing your packets.

This post goes deep. Not "here's a pretty diagram" deep. "I need to tune the false positive rate for a production system handling 400K QPS" deep.