Compiler Foundations: Language, Tokens, Grammar

Lesson, slides, and applied problem sets.

Lesson

Compiler Foundations: Language, Tokens, Grammar

Why this module exists

A compiler is a pipeline that turns text into meaning and then into execution. If the foundation is shaky, everything downstream is harder. This module sets the shared vocabulary we will use in later lessons and quizzes.

We will use a tiny expression language throughout the pack. The details are small on purpose so you can focus on the ideas.

1) The compiler pipeline (big picture)

A typical pipeline looks like this:

Lexing: text -> tokens
Parsing: tokens -> AST (Abstract Syntax Tree)
Semantic analysis: name resolution, type checks, rules
Lowering / IR: AST -> simpler intermediate form
Optimization: improve the IR
Codegen / execution: IR -> bytecode or machine code

Not every compiler uses every stage, but the mental model stays useful.

2) Language design is a set of choices

There is no single "correct" compiler design. You choose:

What the syntax looks like
Which errors are recoverable
How strict or permissive the rules are
Which IR fits your goals (readable, fast, easy to optimize)

Our pack is open-ended on design, but specific on behavior so tests can be reliable.

3) Tokens vs lexemes

Lexeme: the exact text (e.g., "foo", "123", "==")
Token: the category + metadata (kind, text, position)

Example source:

let x = 1 + 2;

Example tokens:

LET (lexeme let)
IDENT (lexeme x)
ASSIGN (lexeme =)
NUMBER (lexeme 1)
PLUS (lexeme +)
NUMBER (lexeme 2)
SEMICOLON (lexeme ;)

Tokens carry positions so errors can point to a line and column.

4) Grammar and precedence

A grammar describes valid structure. A simple expression grammar might be:

expr     -> term ( ("+" | "-") term )*
term     -> factor ( ("*" | "/") factor )*
factor   -> NUMBER | IDENT | "(" expr ")"

The grammar encodes precedence (multiplication before addition) and associativity (left to right for + and -).

5) ASTs are about meaning, not punctuation

An AST keeps the essential structure and drops syntax noise. For 1 + 2 * 3:

The AST will drive later stages like lowering or bytecode generation.

6) Errors should be readable

The goal is not "no errors" but good errors:

Report the location (line, column)
Say what was expected vs what was found
Keep messages short and clear

Readability beats cleverness here.

Module Items

Compiler Foundations Checkpoint
Pipeline stages, tokens, grammar, and AST basics.
Quiz

Lesson

Compiler Foundations: Language, Tokens, Grammar

Why this module exists

1) The compiler pipeline (big picture)

2) Language design is a set of choices

3) Tokens vs lexemes

4) Grammar and precedence

5) ASTs are about meaning, not punctuation

6) Errors should be readable

Module Items

Compiler Foundations Checkpoint