Logs & Streaming
Lesson, slides, and applied problem sets.
View SlidesLesson
Logs & Streaming
Why logs
Logs are the backbone of streaming systems: durable, ordered, replayable. A log gives you a single source of truth that many consumers can read at their own pace.
Offsets and commits
Consumers process records and periodically commit offsets so the system knows what is safely processed. Commit only contiguous offsets to avoid gaps.
Exactly-once processing
Exactly-once semantics require transactions, idempotent writes, and fencing. The system must ignore duplicates, publish records atomically, and reject stale producers.
Consumer groups
Partitions are assigned across consumers. Good assignment minimizes rebalances and balances load.
What you will build
- Commit offset advancement
- Consumer group partition assignment
- Exactly-once transactional processing
- Consumer lag + watermark metrics
- Windowed aggregation with watermarks
- Log compaction by key
- Idempotent producer acceptance
- Rebalance planning (partition moves)
Module Items
Offset Commit Advancement
Advance a commit offset using contiguous processed offsets.
Consumer Group Partition Assignment
Assign partitions to consumers in round-robin order.
Consumer Rebalance Plan
Compute partition moves for a consumer rebalance.
Streaming Metrics: Lag & Watermark
Compute consumer lag and global watermark.
Windowed Aggregation with Watermarks
Aggregate tumbling windows with watermark-based emission.
Idempotent Producer
Accept only in-order producer sequences.
Log Compaction
Compact a log by keeping only the latest record per key.
Exactly-Once Streaming Transactions
Simulate transactional exactly-once semantics with fencing and idempotence.