Feature Engineering

Why Scale Features?

Gradient descent converges faster
Distance metrics work correctly
Regularization applies equally

Min-Max Normalization

x' = (x - min) / (max - min)
Scales to [0, 1]
Use for bounded inputs

Standardization

z = (x - mean) / std
Mean=0, Std=1
Use for different units

One-Hot Encoding

Binary column per category
[0, 1, 0] for "blue"
No false ordering

Missing Values

Drop rows (loses data)
Impute with mean/mode
Add "was missing" indicator

Feature Creation

Polynomial: x², x₁×x₂
Log transform for skewed data
Domain-specific combinations

Feature Selection

Filter: correlation, mutual info
Wrapper: try feature subsets
Embedded: L1 regularization

1 / 1

Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.