From Logistic Regression to Neural Networks

The XOR Problem

(0,0)→0  (0,1)→1
(1,0)→1  (1,1)→0

No single line can separate → logistic regression fails

Why Hidden Layers

Transform input space
Make non-linear problems linearly separable
Activation functions add non-linearity

Network Architecture

Input → [W1, b1] → sigmoid → [W2, b2] → sigmoid → Output

Sigmoid Activation

sigmoid(z) = 1 / (1 + exp(-z))
derivative = sigmoid(z) * (1 - sigmoid(z))

Forward Pass

z1 = W1 @ x + b1    # Hidden linear
h = sigmoid(z1)      # Hidden activation
z2 = W2 @ h + b2    # Output linear
y_pred = sigmoid(z2) # Output activation

Backpropagation

dz2 = y_pred - y_true           # Output error
dW2 = h.T @ dz2                 # Output weights grad
dh = dz2 @ W2.T                 # Hidden error
dz1 = dh * h * (1 - h)          # Through activation
dW1 = x.T @ dz1                 # Hidden weights grad

Training Loop

Forward pass → get predictions
Compute loss (BCE)
Backward pass → get gradients
Update weights: W = W - lr * dW

Weight Initialization

Never all zeros (symmetry problem)
Small random values
Xavier: sqrt(2 / (fan_in + fan_out))

Debugging

Loss not decreasing → check learning rate
NaN loss → add epsilon to log
50% accuracy → more neurons, more epochs

Key Insight

Linear models: limited to linear boundaries Neural networks: learn to transform space + linear boundary

Bridge to Deep Learning

Same pattern, more layers:

[Linear → Activation] × N → Output

Frameworks automate gradients, you understand the math.

1 / 1