Descriptive Statistics

Central Tendency

  • Mean: sum/count, sensitive to outliers
  • Median: middle value, robust
  • Mode: most frequent

Spread

  • Range: max - min
  • Variance: avg squared distance from mean
  • Std: √variance, same units as data

Percentiles

  • 25th = Q1, 50th = median, 75th = Q3
  • IQR = Q3 - Q1
  • Used for outlier detection

Normal Distribution

  • Bell curve: N(μ, σ)
  • 68% within 1σ, 95% within 2σ
  • Central Limit Theorem

Correlation

  • Pearson r: [-1, +1]
  • Only measures linear relationship
  • cov(X,Y) / (std_X * std_Y)

Z-scores

  • z = (x - mean) / std
  • Standardizes to mean=0, std=1
  • Compare across different scales

Outliers

  • |z| > 3 or IQR method
  • Investigate before removing
  • Could be errors or real signal
1 / 1
Use arrow keys or click edges to navigate. Press H to toggle help, F for fullscreen.