LSM Multi-Way Compaction

hard · distributed-systems, storage, lsm

LSM Multi-Way Compaction

You are compacting multiple LSM runs into a single sorted run. Runs are provided newest to oldest. Each run is sorted by key ascending.

For each key, choose the newest version by timestamp (higher wins). If timestamps are equal, the newer run wins.

Tombstones:

  • If the chosen version is a tombstone with Timestamp >= gcBefore, the key is deleted.
  • If the chosen tombstone is older than gcBefore, ignore it and consider the next newest version for that key.

Return the compacted run sorted by key ascending.

Types

type Entry struct {
    Key       string
    Value     string
    Timestamp int
    Tombstone bool
}

Function signature

func MultiWayCompaction(runs [][]Entry, gcBefore int) []Entry

Example

run0 (newest): [ {"a", "v2", 5, false}, {"b", "", 5, true} ]
run1:          [ {"a", "v1", 3, false}, {"b", "v1", 3, false} ]

gcBefore = 4
Result: [ {"a", "v2", 5, false} ]   // b is deleted by tombstone

Constraints

  • 0 <= total entries <= 200000
  • 0 <= Timestamp <= 1_000_000

Notes

  • If all versions of a key are filtered out, the key does not appear in output.
Run tests to see results
No issues detected