Regularization
Overfitting
- Train: great, Test: poor
- Model memorizes noise
- Too complex for data
Bias-Variance Tradeoff
- Bias: too simple, wrong assumptions
- Variance: too sensitive to training data
- Goal: minimize total error
L2 Regularization (Ridge)
- Penalty = λ × Σw²
- Shrinks weights toward zero
- All features kept
L1 Regularization (Lasso)
- Penalty = λ × Σ|w|
- Drives weights to exactly zero
- Automatic feature selection
Dropout
- Randomly drop neurons during training
- Forces redundancy
- Use all neurons at test time
Early Stopping
- Stop when validation loss increases
- Save best model
- Simple and effective
Data Augmentation
- Create synthetic examples
- Images: rotate, flip, crop
- More diversity → better generalization
Choosing λ
- Cross-validate different values
- Too small: overfitting
- Too large: underfitting
1 / 1