Mini-GPT: End-to-End Language Model
Mini-GPT: End-to-End Language Model
Build a GPT-style model with multi-head attention, train it with cross-entropy, and generate text autoregressively.
Components to implement
1) Embedding(Module)
Token and positional embedding lookup tables.
2) LayerNorm(Module)
Normalize a single vector of length D.
3) FeedForward(Module)
Two linear layers with ReLU activation (D -> 4D -> D).
4) MultiHeadAttention(Module)
Split into heads, run attention per head, concatenate, and project.
5) TransformerBlock(Module)
Pre-norm block with residuals:
y = x + MHA(LN(x))
z = y + FFN(LN(y))
6) MiniGPT(Module)
Complete model:
- token embedding
- position embedding
num_layerstransformer blocks- final LayerNorm
- Linear projection to vocab size
7) Helpers
softmaxcross_entropy_lossgenerate
Requirements
- Input
token_idslengthTmust be<= max_seq_len. - Output logits shape:
(T, vocab_size). cross_entropy_lossaverages over positions.generatecrops context tomax_seq_lenand samples next token.embed_dimmust be divisible bynum_heads.
Run tests to see results
No issues detected