Foundations & Encodings
Lesson, slides, and applied problem sets.
View SlidesLesson
Blockchain Foundations & Encodings
Why this module exists
Blockchains are data-first systems. Before consensus or mining matters, you must be able to serialize deterministically and decode unambiguously. A single bit of ambiguity creates consensus forks.
1) Canonical bytes are the protocol
Every object (transaction, block header, address) has a canonical byte layout. Validators don't interpret your structs—they hash bytes. If two nodes serialize slightly differently, they will disagree on txids and merkle roots.
Key properties:
- Deterministic ordering and encoding
- Explicit lengths for variable fields
- Little-endian integers in Bitcoin serialization
2) CompactSize varint
Bitcoin encodes counts and lengths using CompactSize:
- 1 byte for small values
- Prefixed 16/32/64-bit for larger
- Non-canonical encodings are invalid
Canonicality prevents multiple encodings for the same number, which prevents malleability and ambiguity in parsing.
3) Base58Check
Base58Check is a human-friendly encoding with a built-in checksum:
- Avoids visually confusing characters (0/O, I/l)
- Preserves leading zero bytes via leading
1s - Detects typos via 4-byte checksum
Note: we use Base58Check to teach encoding, not wallet security.
4) Practical pitfalls
- Incorrect endian handling = wrong txid/merkle root.
- Accepting non-canonical encodings can lead to consensus splits.
- Address encoding is presentation, not consensus—be clear which layer you are in.
What you will build
- CompactSize encoding/decoding with canonical checks
- Base58Check encoding/decoding with checksum validation
Key takeaways
- Bytes are the protocol boundary.
- Canonical encoding is a consensus rule.
- Base58Check is about robustness of human input.