Mini-GPT

1 / 7

Goal

End-to-end GPT-style language model: embeddings -> transformer blocks -> logits -> generation

2 / 7

Architecture

Token IDs
  -> Token Embedding + Position Embedding
  -> TransformerBlock x N
  -> Final LayerNorm
  -> Linear -> Vocab logits

Shapes: input (T), logits (T, V)

3 / 7

Next-token objective

Targets are input shifted by one. Loss is mean cross-entropy over positions.

4 / 7

Training loop

logits = model(x)
loss = cross_entropy_loss(logits, y)
zero_grads()
loss.backward()
update()
5 / 7

Generation

  • Crop to max_seq_len
  • Softmax last logits
  • Sample with temperature
6 / 7

Note

Single-head attention in this pack for clarity.

7 / 7
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.