Token IDs -> Embeddings -> Blocks -> LayerNorm -> Linear logits
Next-token cross-entropy
forward -> loss -> backward -> SGD
Crop context, softmax last logits, sample next token