Parameters: W shape (V, V) Logits for token t: W[t]
W
(V, V)
t
W[t]
Cross-entropy over next-token targets
Autoregressive: sample next token from softmax