Feature Engineering
Why Scale Features?
- Gradient descent converges faster
- Distance metrics work correctly
- Regularization applies equally
Min-Max Normalization
- x' = (x - min) / (max - min)
- Scales to [0, 1]
- Use for bounded inputs
Standardization
- z = (x - mean) / std
- Mean=0, Std=1
- Use for different units
One-Hot Encoding
- Binary column per category
- [0, 1, 0] for "blue"
- No false ordering
Missing Values
- Drop rows (loses data)
- Impute with mean/mode
- Add "was missing" indicator
Feature Creation
- Polynomial: x², x₁×x₂
- Log transform for skewed data
- Domain-specific combinations
Feature Selection
- Filter: correlation, mutual info
- Wrapper: try feature subsets
- Embedded: L1 regularization
1 / 1