Mini-GPT

1 / 5

Architecture

Token IDs -> Embeddings -> Blocks -> LayerNorm -> Linear logits

2 / 5

Objective

Next-token cross-entropy

3 / 5

Training loop

forward -> loss -> backward -> SGD

4 / 5

Generation

Crop context, softmax last logits, sample next token

5 / 5
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.