LSM Memtable + Merge (C)
Topics: storage engines, ordered data, k-way merge, tombstones
Problem
Model the write path of a log-structured merge (LSM) store. Writes land in an
in-memory memtable; deletes are recorded as tombstones (not removals).
Flushing produces a key-sorted run, and compaction k-way-merges several runs
(oldest → newest) into one. Keys and values are fixed-size strings over
caller-provided storage (no allocation).
typedef struct { char key[32]; char val[32]; int tombstone; } Entry;
typedef struct { Entry *entries; size_t cap; size_t count; } Memtable;
void mt_init(Memtable *m, Entry *storage, size_t cap);
void mt_set(Memtable *m, const char *key, const char *val); // insert/replace
void mt_delete(Memtable *m, const char *key); // record a tombstone
bool mt_get(Memtable *m, const char *key, char *out, size_t outcap); // false if absent/deleted
size_t mt_flush(Memtable *m, Entry *out, size_t maxout); // sorted, tombstones included
size_t lsm_merge(const Entry *const runs[], const size_t *runlens, size_t nruns,
Entry *out, size_t maxout);
- Memtable: one entry per live key.
mt_set replaces; mt_delete keeps the
key but flags it a tombstone; mt_get treats a tombstone as absent.
mt_flush: emit every entry (tombstones included) sorted by key.
lsm_merge: walk all runs by ascending key. For the smallest key across
runs, the entry from the latest run wins (runs are ordered oldest→newest);
advance every cursor sitting on that key. If the winner is a tombstone, skip it.
Key concepts
- Tombstones, not deletes: a delete in a newer run must shadow the same key in
older runs — so deletion is data, carried through flush and merge until compaction
drops it.
- Newest wins per key: the merge resolves conflicts by recency. A later
set resurrects a key an earlier run deleted; a later tombstone removes a key an
earlier run set.
- k-way merge: repeatedly pick the minimum current key across cursors — the
same shape as merging sorted SSTables in a real engine.
Run
cc -std=c11 -O2 -Wall tests.c -o /tmp/lsm && /tmp/lsm
Sign in to submit your solution.