Autograd: Building an Automatic Differentiation Engine
Lesson, slides, and applied problem sets.
View SlidesLesson
Autograd: Building an Automatic Differentiation Engine
Goal
Build a minimal scalar autograd engine (micrograd-style). The forward pass builds a DAG of Value nodes; the backward pass applies the chain rule to compute dL/dx for every node.
Prerequisites: ML Foundations pack.
1) The Value node (scalar)
A Value stores:
data: floatgrad: float, accumulator for dL/d(this)_prev: set of parentValues_op: debug label (e.g., "+", "*")_backward: closure that pushesout.gradinto parents
All later "tensors" are just nested lists of Value.
2) Local derivatives on the forward pass
Each op constructs a new Value and records how gradients flow to its parents.
# a + b
out = Value(a.data + b.data, (a, b), "+")
def _backward():
a.grad += 1.0 * out.grad
b.grad += 1.0 * out.grad
out._backward = _backward
# a * b
out = Value(a.data * b.data, (a, b), "*")
def _backward():
a.grad += b.data * out.grad
b.grad += a.data * out.grad
out._backward = _backward
Note the +=: a node can feed multiple downstream paths.
3) Supported ops (core)
Arithmetic:
__add__,__mul__,__pow__(int/float),__neg__,__sub__,__truediv__- reverse ops:
__radd__,__rmul__,__rsub__,__rtruediv__
Nonlinear:
tanh,relu,exp,log(defined only fordata > 0)
All ops should accept a Value or a Python number; numbers are wrapped as Value.
4) Backward pass (reverse topological order)
We need parents to receive gradients only after their children are ready. Use a DFS topological sort, then traverse in reverse.
def backward(self):
topo = []
visited = set()
def build(v):
if v not in visited:
visited.add(v)
for child in v._prev:
build(child)
topo.append(v)
build(self)
self.grad = 1.0
for node in reversed(topo):
node._backward()
self.grad = 1.0 implements dL/dL = 1 for the loss node.
5) Gradient accumulation
grad accumulates by design. For iterative optimization, zero grads before each new backward:
for p in params:
p.grad = 0.0
loss.backward()
6) Sanity check
a = Value(2.0)
b = Value(-3.0)
c = Value(10.0)
f = (a * b + c) ** 2
f.backward()
# df/da = 2*(a*b + c) * b = -24
# df/db = 2*(a*b + c) * a = 16
# df/dc = 2*(a*b + c) * 1 = 8
Key takeaways
- Autograd is just a DAG of scalar nodes plus local derivatives.
- The forward pass builds the graph; the backward pass applies the chain rule.
- Reverse topological order is required for correctness.
- Gradients accumulate; zero them explicitly between steps.
Next: build Module, Linear, and Sequential to compose networks cleanly.
Module Items
Autograd Value: Building the Computational Graph
Implement the Value class that forms the foundation of automatic differentiation.
Autograd Backward: Automatic Gradient Computation
Implement backward pass with topological sorting and activation functions.
Autograd Checkpoint
Test your understanding of automatic differentiation.
Autograd MLP: Neural Network with Automatic Differentiation
Build and train a complete MLP using the autograd engine.