Arithmetic / Range Coding

Lesson, slides, and applied problem sets.

View Slides

Lesson

Arithmetic / Range Coding

Why this module exists

Huffman assigns whole-bit codewords. Arithmetic/range coding pushes closer to entropy by encoding symbols into a fractional interval. It is the natural endpoint of the theory: probabilities become exact code lengths (in the limit).


1) Interval refinement

Arithmetic coding maintains a range [low, high] inside [0,1). Each symbol narrows the range proportional to its probability. The final interval uniquely identifies the message.

We simulate this with fixed-precision integers:

  • Start with [0, 2^N - 1]
  • Update using cumulative frequencies
  • Emit bits whenever the interval is stable

2) Underflow (the tricky part)

When the interval straddles the midpoint, we can't emit a bit yet. We track pending bits until the interval moves away from the center.

This is the arithmetic coding equivalent of carry handling in range coders.


3) Static model

We will use a static frequency table supplied to both encoder and decoder. This isolates the core arithmetic coding logic without adaptive modeling complexity.


4) Why this matters

  • Expected length approaches H within fractions of a bit.
  • Coding cost matches probabilities more precisely than Huffman.
  • Real compressors use range coding with adaptive models.

What you will build

  • A static arithmetic coder (bit-precise)
  • Round-trip tests and adversarial cases

Key takeaways

  • Arithmetic coding is model-driven and nearly optimal.
  • Correct handling of renormalization and underflow is essential.
  • Small mistakes produce silent corruption.

Module Items