MLP Language Model
MLP Language Model
Build a fixed-context MLP language model. The model predicts the next token using a window of previous tokens.
Model
- Token embedding:
(V, D) - Context length:
T(block size) - Flattened input size:
T * D - MLP:
T*D -> hidden -> V
Tasks
1) MLPLanguageModel
class MLPLanguageModel(Module):
def __init__(self, vocab_size: int, block_size: int, embed_dim: int, hidden_dim: int):
pass
def forward(self, token_ids: List[int]) -> List[Value]:
# token_ids length == block_size
# return logits of length vocab_size
pass
2) cross_entropy_loss(logits, target)
- Softmax over logits
- Return
-log(p_target)
3) train_step(model, xs, ys, lr)
xs: list of contexts (each lengthblock_size)ys: list of target token IDs- Return mean loss value
4) generate(model, start_ids, max_new_tokens, temperature=1.0)
- Use a rolling context window of size
block_size - Sample next tokens autoregressively
Notes
- Use
Valueoperations to keep gradients intact. block_sizemust be consistent across training and generation.
Run tests to see results
No issues detected