Model Evaluation
Train/Test Split
- Never evaluate on training data
- Typical: 80% train, 20% test
- Overfitting hides in train metrics
Confusion Matrix
- TN, FP, FN, TP
- Shows error types
- Foundation for other metrics
Precision
- TP / (TP + FP)
- Of predicted positives, how many correct?
- High when FP costly
Recall
- TP / (TP + FN)
- Of actual positives, how many found?
- High when FN costly
F1 Score
- 2 × P × R / (P + R)
- Harmonic mean
- Good for imbalanced data
ROC and AUC
- TPR vs FPR at various thresholds
- AUC: single number summary
- 1.0 = perfect, 0.5 = random
Cross-Validation
- K-fold: train/test K times
- Average scores
- More robust than single split
1 / 1