LSM Read Path

medium · distributed-systems, storage, lsm

LSM Read Path

Implement a simplified LSM-tree read path. You have a memtable and a list of SSTables ordered from newest to oldest. Each entry has a timestamp and may be a tombstone.

Lookup rules:

  1. Check the memtable first (pick the entry with the highest timestamp for the key).
  2. If not found, check SSTables in order.
    • If MaybeHasKey is false, skip the table (Bloom filter says no).
    • Otherwise, binary search the table's entries (sorted by key).
  3. A tombstone with Timestamp >= gcBefore means the key is deleted. A tombstone with Timestamp < gcBefore is considered GC'd and ignored.

Return the value and whether it exists.

Types

type Entry struct {
    Key       string
    Value     string
    Timestamp int
    Tombstone bool
}

type SSTable struct {
    MaybeHasKey bool
    Entries     []Entry // sorted by Key ascending
}

Function signature

func LSMGet(key string, mem []Entry, tables []SSTable, gcBefore int) (value string, ok bool)

Example

mem = [{Key:"a", Value:"1", Timestamp:5}]
tables = [
  {MaybeHasKey:true, Entries:[{Key:"a", Tombstone:true, Timestamp:3}]},
]

gcBefore = 4
=> "1", true   // tombstone is older than gcBefore

Constraints

  • 0 <= len(mem) <= 100000
  • 0 <= len(tables) <= 1000
  • 0 <= Timestamp <= 1_000_000
  • Entries in each SSTable are sorted by Key

Notes

  • If the key is not found, return "" and false.
Run tests to see results
No issues detected