Go Runtime Performance

Lesson, slides, and applied problem sets.

Lesson

Go Runtime Performance: Allocations, GC, Goroutines, Memory Model, Generics

This module is a deep, practical reference for performance‑critical Go code. It focuses on runtime behavior (allocations, garbage collection, scheduler), safety constraints (memory model), and tradeoffs (generics vs interface vs monomorphized patterns). It also includes a measurement playbook.

Measurement playbook (do this first)

Reproduce: isolate the workload in a benchmark or minimal harness.
Profile: CPU + allocations + pprof + trace (CPU shows time, allocs show churn, trace shows scheduler/GC/IO).
Classify: determine if the bottleneck is CPU, allocation/GC, lock contention, syscalls, or I/O.
Hypothesize: form 1–3 candidate fixes (algorithmic, data layout, batching, caching).
Verify: implement one change, re‑measure. Keep the fastest correct version.
Guard: lock in performance with micro‑benchmarks and allocation checks.

Tooling: concrete commands and what they show

go test -bench . -benchmem → allocs/op and bytes/op for every benchmark.
go test -run none -bench BenchmarkFoo -benchmem -cpuprofile cpu.out -memprofile mem.out → capture CPU + heap profiles.
go tool pprof -http=:0 cpu.out → visualize hot functions and inlining.
go test -run none -bench BenchmarkFoo -trace trace.out + go tool trace trace.out → goroutine scheduling, GC, syscalls.
GODEBUG=gctrace=1 → GC lines like: gc 15 @1.234s 0%: 0.45+1.2+0.02 ms clock, 4->6->3 MB, 8 MB goal
- Watch heap size (live set) and GC pause; high allocation rate + large live set = trouble.

Allocations and escape analysis

Goal: keep hot‑path data on the stack and reuse buffers to avoid GC pressure.
Use go test -bench . -benchmem to track allocs/op and bytes/op.
Common allocation sources:
- Converting []byte to string (copy) and string to []byte (copy)
- fmt.Sprintf / fmt.Errorf
- append growth when capacity is insufficient
- map growth or map iteration that forces allocations
- interface{} / generics boxing when values escape
Use preallocation: make([]T, 0, n) for slices, make(map[K]V, n) for maps.
Prefer slicing into existing buffers over new allocations.
Investigate escapes with go build -gcflags=all=-m and look for “escapes to heap”.

Garbage collector (GC) essentials

Go uses a concurrent, tri‑color, mark‑and‑sweep GC with a pacer.
The GC cost is proportional to live heap size more than total allocations.
Two critical metrics: allocation rate and live heap size.
Tactics:
- Reduce long‑lived heap objects (move hot data to stack, or reuse buffers).
- Prefer compact, contiguous data structures to reduce pointer‑tracing.
- Avoid per‑request object graphs in hot paths.
Tuning knobs:
- GOGC (default 100): higher → fewer GCs, more memory; lower → more GCs, less memory.
- GOMEMLIMIT (Go 1.19+): hard memory target; GC becomes more aggressive as you approach it.
Heap size matters more than churn; shrinking the live set is often the biggest win.

Goroutines and scheduler

Goroutines are cheap but not free. Each has a stack that grows on demand.
Scheduler contention and synchronization are frequent real bottlenecks.
Guidelines:
- Use coarse‑grained goroutines; avoid spawning per small task.
- Batch work to amortize synchronization.
- Minimize shared mutable state; shard state if possible.
- Measure with go test -run Test -bench . and go test -trace for scheduler insights.
Mental model: M/P/G (machine threads, processors, goroutines). Blocking syscalls park an M; cgo can pin threads.
For CPU‑bound work, use worker pools; for I/O‑bound work, goroutine‑per‑request is usually fine.

Memory model (correctness first)

The Go memory model guarantees that reads observe writes only through synchronization.
Use sync.Mutex, sync/atomic, channels, or other synchronization to establish happens‑before.
Avoid data races even if they "seem to work"; races invalidate compiler and CPU assumptions.
If performance requires lock‑free patterns, validate with -race and micro‑benchmarks.
Example (racy):

var done bool
var x int
go func() {
    x = 1
    done = true
}()
for !done {}
fmt.Println(x) // data race; no happens-before

Example (safe):

var x int
done := make(chan struct{})
go func() {
    x = 1
    close(done)
}()
<-done
fmt.Println(x) // safe

Generics performance notes

Generics reduce code duplication but can hide costs if values escape.
Performance depends on constraints and inlining:
- Constrained types can enable inlining and avoid interface boxing.
- Use ~ type sets to allow optimized operations on underlying types.
Beware of generic functions that return interface values or store into any.
For hot paths, measure both generic and hand‑specialized versions; choose the faster and clearer one.
Prefer constraints like constraints.Integer/constraints.Float when you need ops without interface boxing.
Avoid returning any from generic helpers; it forces interface allocation for non‑pointer values.

Go Runtime Performance

Lesson

Go Runtime Performance: Allocations, GC, Goroutines, Memory Model, Generics

Measurement playbook (do this first)

Tooling: concrete commands and what they show

Allocations and escape analysis

Garbage collector (GC) essentials

Goroutines and scheduler

Memory model (correctness first)

Generics performance notes

Zero‑allocation patterns

Common pitfalls

Module Items

Zero-Alloc HTTP Server

Byte Arena