Bigram Language Model

1 / 5

Model

Parameters: W shape (V, V) Logits for token t: W[t]

2 / 5

Loss

Cross-entropy over next-token targets

3 / 5

Training

  • forward
  • zero grads
  • backward
  • SGD step
4 / 5

Sampling

Autoregressive: sample next token from softmax

5 / 5
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.