Bloom Filter (C)
Topics: probabilistic data structures, bit manipulation, hashing
Problem
Implement a Bloom filter: a compact, probabilistic set that can say "definitely
not present" or "possibly present," using a caller-provided array of 64-bit words
(no allocation).
typedef struct { uint64_t *bits; size_t nbits; int k; } Bloom;
void bloom_init(Bloom *b, uint64_t *storage, size_t nwords, int k);
void bloom_add(Bloom *b, const char *s);
bool bloom_maybe(const Bloom *b, const char *s); // false = definitely absent
Key concepts
- k bits per item. Adding an item hashes it
k ways and sets those k bits in
the array. A lookup hashes the same k ways and reports "maybe present" only if
all k bits are set.
- No false negatives, some false positives. If you added it, every one of its
bits is set, so
bloom_maybe can never wrongly say "absent." But unrelated items
may happen to set the same bits, so it can wrongly say "maybe present." That
asymmetry is the whole value: a cheap pre-filter that lets you skip the expensive
lookup for the (common) definitely-absent case.
- Double hashing for k positions. You do not need k independent hash functions.
Compute two base hashes
h1, h2 and derive position i as h1 + i*h2 (mod the
bit count). Keep h2 odd so it strides the whole table.
- Set/test a bit: word
i >> 6, bit i & 63.
Bloom filters guard caches, databases (to skip disk reads for missing keys), and
spell-checkers, anywhere "is this maybe in the set?" is worth answering in a few
bits.
Run
cc -std=c11 -O2 -Wall tests.c -o /tmp/bloom && /tmp/bloom
Sign in to submit your solution.