Neural Network Abstractions

Lesson, slides, and applied problem sets.

View Slides

Lesson

Neural Network Abstractions

Goal

Create a small PyTorch-like API for composing models:

  • Module base class
  • Linear layer
  • Sequential container
  • Activation modules

Prerequisites: Autograd module.


1) Module contract

A Module should:

  1. Be callable: __call__ -> forward
  2. Track parameters automatically
  3. Track nested modules recursively
  4. Support train/eval flags

2) Auto-registration via __setattr__

class Module:
    def __init__(self):
        object.__setattr__(self, "_parameters", {})
        object.__setattr__(self, "_modules", {})
        object.__setattr__(self, "training", True)

    def __setattr__(self, name, value):
        if isinstance(value, Value):
            self._parameters[name] = value
        elif isinstance(value, Module):
            self._modules[name] = value
        elif isinstance(value, (list, tuple)):
            self._register_list(name, value)
        object.__setattr__(self, name, value)

3) Parameters API

def parameters(self):
    params = list(self._parameters.values())
    for m in self._modules.values():
        params.extend(m.parameters())
    return params


def named_parameters(self, prefix=""):
    for name, p in self._parameters.items():
        full = f"{prefix}{name}" if prefix else name
        yield full, p
    for name, m in self._modules.items():
        sub = f"{prefix}{name}." if prefix else f"{name}."
        yield from m.named_parameters(sub)

4) Linear layer

Shapes:

  • input x: length in_features
  • weight: (out_features, in_features)
  • bias: (out_features,) or None

Forward:

y_i = sum_j weight[i][j] * x[j] + bias[i]

Initialization: Xavier uniform with k = 1/sqrt(in_features).


5) Sequential + activations

Sequential applies modules in order. Activations are modules with no params:

  • Tanh, ReLU, Sigmoid

Key takeaways

  1. Module centralizes parameter tracking and nesting.
  2. __setattr__ is the hook that makes it automatic.
  3. Linear + activations + Sequential are enough to build MLPs.

Next: tokenization and batching for language modeling.


Module Items