Tokenization and Batching

1 / 5

Vocab

chars = sorted(set(text)) stoi, itos for reversible mapping

2 / 5

Encode / Decode

encode(s) -> [ids] decode(ids) -> string

3 / 5

Train/Val split

Deterministic split by index

4 / 5

Batch sampling

For each batch:

  • pick random start i
  • x = data[i:i+T]
  • y = data[i+1:i+T+1]

Pass RNG for determinism.

5 / 5
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.