MLP Language Model

Lesson, slides, and applied problem sets.

View Slides

Lesson

MLP Language Model (Fixed Context)

Goal

Increase context length by using an MLP over a fixed window of previous tokens.


1) Windowed context

Choose a block size T (context length). For each training example:

  • input: tokens x[i : i+T]
  • target: token x[i+T]

2) Embed + concatenate

Each token becomes a vector of length D. Concatenate T embeddings into a single vector of length T * D.


3) MLP head

An MLP maps T * D -> hidden -> V logits. This is equivalent to a feed-forward classifier over the window.


4) Loss + training

Use cross-entropy over the target token. Train with SGD.


5) Generation

Maintain a rolling context window:

  • start with a seed
  • predict next token
  • append and slide window

Key takeaways

  1. MLP LM increases context without attention.
  2. It introduces embedding concatenation and classifier heads.
  3. It prepares you for attention-based models by using the same loss and generation loop.

Next: self-attention.


Module Items