Lexing: From Text to Tokens

Lesson, slides, and applied problem sets.

Lesson

Lexing: From Text to Tokens

Why this module exists

Before a parser can understand structure, we need to turn raw characters into a clean stream of tokens. A good lexer is small, predictable, and easy to debug.

1) The core loop

A lexer usually has one loop:

Skip whitespace and comments
Read the next token (identifier, number, operator, punctuation)
Repeat until EOF

This is simple and reliable.

2) Longest match wins

For operators like = and ==, always check the two-character form first.

Examples:

== is one token, not two = tokens
<= is one token, not < and =

3) Identifiers and keywords

Identifiers share a rule:

Start with a letter or _
Continue with letters, digits, or _

After scanning an identifier, compare its lexeme to the keyword list (let, fn, if, ...). If it matches, emit the keyword token.

4) Positions matter

Track the start position (line + column) of each token. That data makes errors useful later.

Tip: update line/column as you advance. For newline, increment line and reset column to 1.

5) Keep it readable

A readable lexer is easier to extend. Straightforward loops beat clever tricks.

Module Items

Tiny Lexer
Turn source text into tokens with line and column positions.
easy compilers · lexing · tokens

Lesson

Lexing: From Text to Tokens

Why this module exists

1) The core loop

2) Longest match wins

3) Identifiers and keywords

4) Positions matter

5) Keep it readable

Module Items

Tiny Lexer