Mini-GPT: Character-Level Language Model

hard · gpt, transformers, language-models, capstone

Mini-GPT: Character-Level Language Model

Build a complete GPT-style model from scratch using your autograd and module system. This is the capstone for the pack.

What you are building

1) Embedding(Module)

Lookup table for token or position embeddings.

class Embedding(Module):
    def __init__(self, num_embeddings: int, embedding_dim: int):
        pass

    def forward(self, indices: List[int]) -> List[List[Value]]:
        pass

2) LayerNorm(Module)

Normalize a single vector of length D.

class LayerNorm(Module):
    def __init__(self, dim: int, eps: float = 1e-5):
        pass

    def forward(self, x: List[Value]) -> List[Value]:
        pass

3) FeedForward(Module)

Two linear layers with activation in between.

class FeedForward(Module):
    def __init__(self, embed_dim: int, hidden_dim: int | None = None):
        pass

    def forward(self, x: List[Value]) -> List[Value]:
        pass

4) SelfAttention(Module)

Single-head self-attention.

class SelfAttention(Module):
    def __init__(self, embed_dim: int):
        pass

    def forward(self, x: List[List[Value]], causal: bool = True) -> List[List[Value]]:
        pass

5) TransformerBlock(Module)

Pre-norm block with residuals:

class TransformerBlock(Module):
    def __init__(self, embed_dim: int, num_heads: int):
        # num_heads is included for API compatibility
        pass

    def forward(self, x: List[List[Value]]) -> List[List[Value]]:
        pass

6) MiniGPT(Module)

Complete model: embeddings -> blocks -> final norm -> output projection.

class MiniGPT(Module):
    def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_seq_len):
        pass

    def forward(self, token_ids: List[int]) -> List[List[Value]]:
        # returns logits of shape (T, vocab_size)
        pass

7) Training + generation helpers

def cross_entropy_loss(logits: List[List[Value]], targets: List[int]) -> Value:
    pass

def generate(model, start_ids, max_new_tokens, temperature=1.0) -> List[int]:
    pass

Requirements and notes

  • Shapes:
    • token_ids: (T,)
    • embeddings: (T, D)
    • logits: (T, V)
  • T must not exceed max_seq_len.
  • Cross-entropy should compute softmax internally and average over positions.
  • generate should crop context to max_seq_len, apply temperature, and sample.
  • This pack uses single-head attention; num_heads can be ignored or used for validation.

Example

text = "hello"
chars = sorted(set(text))
char_to_idx = {c: i for i, c in enumerate(chars)}
encode = lambda s: [char_to_idx[c] for c in s]

model = MiniGPT(vocab_size=len(chars), embed_dim=16, num_heads=2, num_layers=2, max_seq_len=8)

x = encode(text)
logits = model(x)
loss = cross_entropy_loss(logits, x[1:] + [x[-1]])
Run tests to see results
No issues detected