PCA
The Goal
- Reduce dimensions
- Preserve maximum variance
- Find natural axes of data
Algorithm Steps
- Center data (subtract mean)
- Compute covariance matrix
- Eigendecomposition
- Sort by eigenvalue
- Project to top K components
Principal Components
- Eigenvectors of covariance matrix
- Orthogonal to each other
- Ordered by variance captured
Explained Variance
- eigenvalue / sum(eigenvalues)
- Keep 95% of variance typically
- Plot cumulative to choose K
Projection
- X_new = X_centered @ components
- Reconstruction: X_approx = X_new @ components.T
Important Notes
- Standardize first!
- Only linear relationships
- Sensitive to outliers
When to Use
- Visualization (K=2 or 3)
- Noise reduction
- Speed up training
- Decorrelate features
1 / 1