Bytecode Format: A Real Compile Artifact

Lesson, slides, and applied problem sets.

View Slides

Lesson

Bytecode Format: A Real Compile Artifact

Why this module exists

A compiler feels "real" when it produces an artifact that can be saved, shared, and executed later. We will define a simple, readable bytecode format and implement encode/decode for it.

This is not about compression or speed. It is about clarity.


1) The BC1 text format

We use a line-based text format with a small header:

BC1
FUNC add a b
LOAD a
LOAD b
ADD
RETURN
END
MAIN
PUSH_NUM 1
PUSH_NUM 2
CALL add 2
END

Rules:

  • The first non-empty line is BC1
  • FUNC <name> <param...> starts a function section
  • MAIN starts the main section
  • END ends the current section
  • Empty lines and lines starting with # are ignored

2) Instruction encoding

Instructions are one per line. Examples:

  • PUSH_NUM 42
  • PUSH_STR "hello"
  • LOAD x
  • CALL add 2
  • JUMP 12

String literals use double quotes with escapes: \", \\, \n, \t.


3) Why text?

Text is easy to debug and inspect. Once you understand the pipeline, you can switch to a binary format later.


Module Items