DB Internals Foundations

Lesson, slides, and applied problem sets.

View Slides

Lesson

DB Internals Foundations

Why this module exists

Databases are not magic; they are carefully layered storage engines with strict invariants. If you do not understand page layout, WAL ordering, and durability semantics, you cannot reason about correctness or performance.

This pack treats the DB as a system: pages on disk, cached in memory, modified under transactional rules, and recovered after crashes.


1) The storage stack

At minimum, a storage engine includes:

  • Disk pages: fixed-size blocks on storage
  • Buffer pool: in-memory cache of pages
  • WAL (write-ahead log): an append-only log of changes
  • Index: e.g., B+Tree mapping keys to records
  • Transactions: rules that define atomicity and isolation

Each layer enforces invariants. The WAL guarantees durability; the buffer pool makes pages fast; the index keeps lookups cheap.


2) WAL and the durability rule

The core rule is simple:

WAL rule: a log record describing a change must be written to stable storage before the page containing that change is written.

This single rule makes recovery possible. It allows the system to replay changes after a crash.


3) Steal / no-force and why logs exist

Two policies define how pages and commits behave:

  • Steal: dirty pages may be written before commit
  • No-force: committed pages may stay in memory without immediate flush

Modern DBs use steal + no-force because it performs well. WAL is the mechanism that makes it safe.


4) LSNs and ordering

Each log record has a log sequence number (LSN). Pages store the LSN of the latest change applied to them. During recovery:

  • Redo applies updates with LSN > pageLSN
  • Undo rolls back updates from uncommitted transactions

LSNs are the backbone of recovery ordering.


5) What you will build

  • A log record encoder/decoder with explicit fields.
  • Later modules: WAL recovery and integration into a capstone engine.

Key takeaways

  • WAL is the durability contract of the storage engine.
  • Steal/no-force requires careful logging and recovery.
  • LSNs are the ordering mechanism for correctness.

Module Items