MLP Language Model

Lesson, slides, and applied problem sets.

Lesson

MLP Language Model (Fixed Context)

Goal

Increase context length by using an MLP over a fixed window of previous tokens.

1) Windowed context

Choose a block size T (context length). For each training example:

input: tokens x[i : i+T]
target: token x[i+T]

2) Embed + concatenate

Each token becomes a vector of length D. Concatenate T embeddings into a single vector of length T * D.

3) MLP head

An MLP maps T * D -> hidden -> V logits. This is equivalent to a feed-forward classifier over the window.

4) Loss + training

Use cross-entropy over the target token. Train with SGD.

5) Generation

Maintain a rolling context window:

start with a seed
predict next token
append and slide window

Key takeaways

MLP LM increases context without attention.
It introduces embedding concatenation and classifier heads.
It prepares you for attention-based models by using the same loss and generation loop.

Next: self-attention.

Module Items

MLP Language Model
Build a windowed MLP language model with embeddings.
hard language-models · mlp · context-window