Huffman Coding

Lesson, slides, and applied problem sets.

Lesson

Huffman Coding

Why this module exists

Huffman coding is the canonical example of optimal prefix codes when symbol probabilities are known. It is also the point where theory meets real implementation details: tie-breaking, canonical forms, and deterministic decoding tables.

1) The Huffman idea

Given frequencies f_i, build a prefix-free code that minimizes expected length:

L = sum_i p_i * l_i

Huffman does this by repeatedly combining the two least probable symbols. This greedy step is optimal and produces a full binary tree.

2) Tree construction and tie-breaking

The algorithm is simple; the determinism is not.

If two nodes have equal weight, you must define a tie-breaker (e.g., smallest symbol id). Otherwise, different runs can generate different but still optimal codes, which breaks tests and canonicalization.

We will use:

Primary: lower frequency first
Tie: lower minimum symbol id in the subtree

3) From tree to code

Assign 0 to left edges, 1 to right edges. The path from root to a leaf gives the code.

If only one symbol exists, assign it a 1-bit code 0. This avoids zero-length codes.

4) Canonical Huffman

Canonical Huffman replaces tree shape with a deterministic code assignment based only on code lengths.

Steps:

Sort symbols by (length, symbol id).
Assign codes in order, incrementing as integers.
Left shift when length increases.

Benefits:

Smaller headers (only lengths are sent)
Fast decode tables
Deterministic output

5) Practical boundaries

Code lengths are bounded by the number of symbols and total weight.
For real formats, lengths are often capped (e.g., 15 in DEFLATE).
We will not length-limit, but our constraints keep lengths small.

What you will build

Huffman tree build + code table
Canonical Huffman from code lengths

Key takeaways

Huffman is optimal for a known distribution.
Tie-breaking matters for reproducibility.
Canonical form is the industry standard for compact headers.

Huffman Coding

Lesson

Huffman Coding

Why this module exists

1) The Huffman idea

2) Tree construction and tie-breaking

3) From tree to code

4) Canonical Huffman

5) Practical boundaries

What you will build

Key takeaways

Module Items

Huffman Build

Canonical Huffman

Huffman Coding Checkpoint