MLP Language Model
Lesson, slides, and applied problem sets.
View SlidesLesson
MLP Language Model (Fixed Context)
Goal
Increase context length by using an MLP over a fixed window of previous tokens.
1) Windowed context
Choose a block size T (context length). For each training example:
- input: tokens
x[i : i+T] - target: token
x[i+T]
2) Embed + concatenate
Each token becomes a vector of length D. Concatenate T embeddings into a single vector of length T * D.
3) MLP head
An MLP maps T * D -> hidden -> V logits. This is equivalent to a feed-forward classifier over the window.
4) Loss + training
Use cross-entropy over the target token. Train with SGD.
5) Generation
Maintain a rolling context window:
- start with a seed
- predict next token
- append and slide window
Key takeaways
- MLP LM increases context without attention.
- It introduces embedding concatenation and classifier heads.
- It prepares you for attention-based models by using the same loss and generation loop.
Next: self-attention.