Bigram Language Model
Bigram Language Model
Build a trainable bigram language model. This is the smallest end-to-end LM: it predicts the next token using only the current token.
Model
A bigram LM stores logits for each token pair:
- Parameters:
Wof shape(V, V) - Given token
t, logits areW[t]
You can implement this with a single embedding table where embedding_dim = vocab_size.
Tasks
1) softmax(scores)
- Accepts a list of
Value - Must be numerically stable (subtract max)
2) cross_entropy_loss(logits, targets)
logits: list of lengthT, each lengthVtargets: list of lengthT- Return mean negative log-probability of correct targets
3) BigramLM(Module)
class BigramLM(Module):
def __init__(self, vocab_size: int):
pass
def forward(self, token_ids: List[int]) -> List[List[Value]]:
pass
4) train_step(model, x, y, lr)
- Forward -> loss -> backward -> SGD update
- Return
loss.data
5) generate(model, start_ids, max_new_tokens, temperature=1.0)
- Autoregressively sample next tokens
- Use the logits from the last position
Notes
- Use
Valueoperations so gradients flow to parameters. generateshould return a list of token IDs including the prompt.
Run tests to see results
No issues detected