Mini-GPT: End-to-End Language Model

hard · gpt, transformers, language-models

Mini-GPT: End-to-End Language Model

Build a GPT-style model with multi-head attention, train it with cross-entropy, and generate text autoregressively.

Components to implement

1) Embedding(Module)

Token and positional embedding lookup tables.

2) LayerNorm(Module)

Normalize a single vector of length D.

3) FeedForward(Module)

Two linear layers with ReLU activation (D -> 4D -> D).

4) MultiHeadAttention(Module)

Split into heads, run attention per head, concatenate, and project.

5) TransformerBlock(Module)

Pre-norm block with residuals:

y = x + MHA(LN(x))
z = y + FFN(LN(y))

6) MiniGPT(Module)

Complete model:

  • token embedding
  • position embedding
  • num_layers transformer blocks
  • final LayerNorm
  • Linear projection to vocab size

7) Helpers

  • softmax
  • cross_entropy_loss
  • generate

Requirements

  • Input token_ids length T must be <= max_seq_len.
  • Output logits shape: (T, vocab_size).
  • cross_entropy_loss averages over positions.
  • generate crops context to max_seq_len and samples next token.
  • embed_dim must be divisible by num_heads.
Run tests to see results
No issues detected