DB Internals Foundations
Lesson, slides, and applied problem sets.
View SlidesLesson
DB Internals Foundations
Why this module exists
Databases are not magic; they are carefully layered storage engines with strict invariants. If you do not understand page layout, WAL ordering, and durability semantics, you cannot reason about correctness or performance.
This pack treats the DB as a system: pages on disk, cached in memory, modified under transactional rules, and recovered after crashes.
1) The storage stack
At minimum, a storage engine includes:
- Disk pages: fixed-size blocks on storage
- Buffer pool: in-memory cache of pages
- WAL (write-ahead log): an append-only log of changes
- Index: e.g., B+Tree mapping keys to records
- Transactions: rules that define atomicity and isolation
Each layer enforces invariants. The WAL guarantees durability; the buffer pool makes pages fast; the index keeps lookups cheap.
2) WAL and the durability rule
The core rule is simple:
WAL rule: a log record describing a change must be written to stable storage before the page containing that change is written.
This single rule makes recovery possible. It allows the system to replay changes after a crash.
3) Steal / no-force and why logs exist
Two policies define how pages and commits behave:
- Steal: dirty pages may be written before commit
- No-force: committed pages may stay in memory without immediate flush
Modern DBs use steal + no-force because it performs well. WAL is the mechanism that makes it safe.
4) LSNs and ordering
Each log record has a log sequence number (LSN). Pages store the LSN of the latest change applied to them. During recovery:
- Redo applies updates with LSN > pageLSN
- Undo rolls back updates from uncommitted transactions
LSNs are the backbone of recovery ordering.
5) What you will build
- A log record encoder/decoder with explicit fields.
- Later modules: WAL recovery and integration into a capstone engine.
Key takeaways
- WAL is the durability contract of the storage engine.
- Steal/no-force requires careful logging and recovery.
- LSNs are the ordering mechanism for correctness.