Autograd: Scalar Reverse-Mode
Lesson, slides, and applied problem sets.
View SlidesLesson
Autograd: Scalar Reverse-Mode
Goal
Build a tiny scalar autograd engine. The forward pass builds a DAG of Value nodes; the backward pass applies the chain rule to compute gradients.
Prerequisites: ML Foundations pack.
1) The Value node
Each Value stores:
data: floatgrad: float accumulator for dL/d(this)_prev: set of parent Values_op: debug label_backward: closure to push gradients to parents
Everything else (vectors, matrices) is just lists of Value.
2) Local derivatives on the forward pass
Every operation returns a new Value and defines how gradients flow:
# a + b
out = Value(a.data + b.data, (a, b), "+")
def _backward():
a.grad += 1.0 * out.grad
b.grad += 1.0 * out.grad
out._backward = _backward
# a * b
out = Value(a.data * b.data, (a, b), "*")
def _backward():
a.grad += b.data * out.grad
b.grad += a.data * out.grad
out._backward = _backward
Use += because a node might be used multiple times.
3) Backward pass (reverse topological order)
def backward(self):
topo = []
visited = set()
def build(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build(child)
topo.append(v)
build(self)
self.grad = 1.0
for node in reversed(topo):
node._backward()
4) Supported ops
Arithmetic:
+,*,**, unary-,-,/- reverse ops:
__radd__,__rmul__,__rsub__,__rtruediv__
Nonlinear:
tanh,relu,exp,log(only defined fordata > 0)
All ops accept Value or Python numbers.
5) Gradient accumulation
Gradients accumulate by default:
for p in params:
p.grad = 0.0
loss.backward()
This is intentional and enables minibatch accumulation.
Key takeaways
- Autograd is a DAG + local derivatives.
- Reverse topological order ensures correctness.
- Gradients accumulate; you must zero them between steps.
Next: build Module, Linear, and Sequential to compose models.