From Logistic Regression to Neural Networks
The XOR Problem
(0,0)→0 (0,1)→1
(1,0)→1 (1,1)→0
No single line can separate → logistic regression fails
Why Hidden Layers
- Transform input space
- Make non-linear problems linearly separable
- Activation functions add non-linearity
Network Architecture
Input → [W1, b1] → sigmoid → [W2, b2] → sigmoid → Output
Sigmoid Activation
sigmoid(z) = 1 / (1 + exp(-z))
derivative = sigmoid(z) * (1 - sigmoid(z))
Forward Pass
z1 = W1 @ x + b1 # Hidden linear
h = sigmoid(z1) # Hidden activation
z2 = W2 @ h + b2 # Output linear
y_pred = sigmoid(z2) # Output activation
Backpropagation
dz2 = y_pred - y_true # Output error
dW2 = h.T @ dz2 # Output weights grad
dh = dz2 @ W2.T # Hidden error
dz1 = dh * h * (1 - h) # Through activation
dW1 = x.T @ dz1 # Hidden weights grad
Training Loop
- Forward pass → get predictions
- Compute loss (BCE)
- Backward pass → get gradients
- Update weights: W = W - lr * dW
Weight Initialization
- Never all zeros (symmetry problem)
- Small random values
- Xavier: sqrt(2 / (fan_in + fan_out))
Debugging
- Loss not decreasing → check learning rate
- NaN loss → add epsilon to log
- 50% accuracy → more neurons, more epochs
Key Insight
Linear models: limited to linear boundaries Neural networks: learn to transform space + linear boundary
Bridge to Deep Learning
Same pattern, more layers:
[Linear → Activation] × N → Output
Frameworks automate gradients, you understand the math.
1 / 1