Neural Network Abstractions: Module, Linear, Sequential

Lesson, slides, and applied problem sets.

View Slides

Lesson

Neural Network Abstractions: Module, Linear, Sequential

Goal

Replace ad-hoc classes with a small framework that matches PyTorch's mental model:

  • Module base class with automatic parameter tracking
  • Linear layer and common activations
  • Sequential container for composition

Prerequisites: Autograd module.


1) The Module contract

A Module should:

  1. Be callable via __call__ -> forward
  2. Track parameters (Values) automatically
  3. Track nested modules recursively
  4. Support train/eval mode flags

Minimal state:

  • _parameters: dict[str, Value]
  • _modules: dict[str, Module]
  • training: bool

2) Auto-registration via __setattr__

When a user assigns attributes, intercept and register:

class Module:
    def __init__(self):
        object.__setattr__(self, "_parameters", {})
        object.__setattr__(self, "_modules", {})
        object.__setattr__(self, "training", True)

    def __setattr__(self, name, value):
        if isinstance(value, Value):
            self._parameters[name] = value
        elif isinstance(value, Module):
            self._modules[name] = value
        elif isinstance(value, (list, tuple)):
            self._register_list(name, value)
        object.__setattr__(self, name, value)

Nested lists/tuples should register recursively (e.g., list of layers).


3) Parameter traversal

def parameters(self):
    params = list(self._parameters.values())
    for module in self._modules.values():
        params.extend(module.parameters())
    return params

def named_parameters(self, prefix=""):
    for name, param in self._parameters.items():
        full = f"{prefix}{name}" if prefix else name
        yield full, param
    for name, module in self._modules.items():
        sub = f"{prefix}{name}." if prefix else f"{name}."
        yield from module.named_parameters(sub)

4) Linear layer (fully connected)

Shapes:

  • input x: length in_features
  • weight: (out_features, in_features)
  • bias: (out_features,) or None

Forward:

y_i = sum_j weight[i][j] * x[j] + bias[i]

Initialization: Xavier uniform with k = 1/sqrt(in_features).


5) Sequential container

Sequential stores modules in order and feeds outputs through each:

class Sequential(Module):
    def __init__(self, *modules):
        super().__init__()
        self.module_list = list(modules)
        for i, m in enumerate(modules):
            setattr(self, f"layer_{i}", m)

    def forward(self, x):
        for m in self.module_list:
            x = m(x)
        return x

6) Activation modules

Each activation maps a list of Value to a list of Value:

  • Tanh: xi.tanh()
  • ReLU: xi.relu()
  • Sigmoid: 1 / (1 + exp(-x))

These have no parameters but still behave like Modules.


7) Train/Eval mode

def train(self, mode=True):
    self.training = mode
    for m in self._modules.values():
        m.train(mode)
    return self

def eval(self):
    return self.train(False)

Used by layers like dropout/batchnorm (not implemented here, but the flag is part of the API).


Key takeaways

  1. Module centralizes parameter tracking and nesting.
  2. __setattr__ is the hook that makes it automatic.
  3. Linear + activations + Sequential are enough for many models.
  4. Train/eval mode is an API contract, even if unused now.

Next: embeddings and positional encodings for sequence models.


Module Items