MLP Language Model

hard · language-models, mlp, context-window

MLP Language Model

Build a fixed-context MLP language model. The model predicts the next token using a window of previous tokens.

Model

  • Token embedding: (V, D)
  • Context length: T (block size)
  • Flattened input size: T * D
  • MLP: T*D -> hidden -> V

Tasks

1) MLPLanguageModel

class MLPLanguageModel(Module):
    def __init__(self, vocab_size: int, block_size: int, embed_dim: int, hidden_dim: int):
        pass

    def forward(self, token_ids: List[int]) -> List[Value]:
        # token_ids length == block_size
        # return logits of length vocab_size
        pass

2) cross_entropy_loss(logits, target)

  • Softmax over logits
  • Return -log(p_target)

3) train_step(model, xs, ys, lr)

  • xs: list of contexts (each length block_size)
  • ys: list of target token IDs
  • Return mean loss value

4) generate(model, start_ids, max_new_tokens, temperature=1.0)

  • Use a rolling context window of size block_size
  • Sample next tokens autoregressively

Notes

  • Use Value operations to keep gradients intact.
  • block_size must be consistent across training and generation.
Run tests to see results
No issues detected