Foundations & Encodings

Lesson, slides, and applied problem sets.

View Slides

Lesson

Blockchain Foundations & Encodings

Why this module exists

Blockchains are data-first systems. Before consensus or mining matters, you must be able to serialize deterministically and decode unambiguously. A single bit of ambiguity creates consensus forks.


1) Canonical bytes are the protocol

Every object (transaction, block header, address) has a canonical byte layout. Validators don't interpret your structs—they hash bytes. If two nodes serialize slightly differently, they will disagree on txids and merkle roots.

Key properties:

  • Deterministic ordering and encoding
  • Explicit lengths for variable fields
  • Little-endian integers in Bitcoin serialization

2) CompactSize varint

Bitcoin encodes counts and lengths using CompactSize:

  • 1 byte for small values
  • Prefixed 16/32/64-bit for larger
  • Non-canonical encodings are invalid

Canonicality prevents multiple encodings for the same number, which prevents malleability and ambiguity in parsing.


3) Base58Check

Base58Check is a human-friendly encoding with a built-in checksum:

  • Avoids visually confusing characters (0/O, I/l)
  • Preserves leading zero bytes via leading 1s
  • Detects typos via 4-byte checksum

Note: we use Base58Check to teach encoding, not wallet security.


4) Practical pitfalls

  • Incorrect endian handling = wrong txid/merkle root.
  • Accepting non-canonical encodings can lead to consensus splits.
  • Address encoding is presentation, not consensus—be clear which layer you are in.

What you will build

  1. CompactSize encoding/decoding with canonical checks
  2. Base58Check encoding/decoding with checksum validation

Key takeaways

  • Bytes are the protocol boundary.
  • Canonical encoding is a consensus rule.
  • Base58Check is about robustness of human input.

Module Items