Transform Coding

Lesson, slides, and applied problem sets.

View Slides

Lesson

Transform Coding: BWT, MTF, RLE

Why this module exists

Transforms don't compress by themselves; they reshape data to make it easier to compress with simple coders. BWT clusters similar symbols, MTF turns locality into small integers, and RLE compresses runs of zeros. Together, they create a pipeline that helps entropy coders.


1) Burrows-Wheeler Transform (BWT)

BWT sorts all cyclic rotations of the string and returns the last column plus the index of the original string in the sorted order.

Properties:

  • It is reversible with the primary index.
  • It tends to cluster similar characters together.

This does not reduce entropy, but it creates long runs, which later stages exploit.


2) Move-to-Front (MTF)

MTF maintains an ordered list of symbols. For each symbol:

  • Output its index in the list.
  • Move the symbol to the front.

After BWT, repeated characters map to many zeros or small indices, which are cheap to encode.


3) Run-length encoding (RLE)

RLE replaces runs with length counts. We will use a zero-run encoding to compress the long runs produced by MTF:

  • A run of k zeros is encoded as a single negative value -k.

4) Pipeline intuition

BWT -> MTF -> RLE creates a sparse, skewed distribution:

  • Many zeros (cheap for RLE)
  • Small integers (cheap for entropy coding)

What you will build

  1. BWT forward/inverse (with primary index)
  2. MTF + zero-run RLE pipeline

Key takeaways

  • Transforms rearrange structure; coders exploit it.
  • Reversibility must be exact.
  • The pipeline only works when each stage preserves information.

Module Items