Descriptive Statistics
Central Tendency
- Mean: sum/count, sensitive to outliers
- Median: middle value, robust
- Mode: most frequent
Spread
- Range: max - min
- Variance: avg squared distance from mean
- Std: √variance, same units as data
Percentiles
- 25th = Q1, 50th = median, 75th = Q3
- IQR = Q3 - Q1
- Used for outlier detection
Normal Distribution
- Bell curve: N(μ, σ)
- 68% within 1σ, 95% within 2σ
- Central Limit Theorem
Correlation
- Pearson r: [-1, +1]
- Only measures linear relationship
- cov(X,Y) / (std_X * std_Y)
Z-scores
- z = (x - mean) / std
- Standardizes to mean=0, std=1
- Compare across different scales
Outliers
- |z| > 3 or IQR method
- Investigate before removing
- Could be errors or real signal
1 / 1