Statistics From Scratch
Statistics From Scratch
Implement fundamental statistical functions from scratch. These are essential for understanding data and building ML models.
Functions to implement
1. mean(data)
Compute the arithmetic mean (average).
- Input: A list of numbers
- Output: The sum divided by the count
2. variance(data, population=True)
Compute the variance.
- Input: A list of numbers, and whether to use population variance
- Output: Average squared deviation from mean
- Population variance divides by n, sample variance divides by (n-1)
3. std(data, population=True)
Compute the standard deviation.
- Input: A list of numbers, and whether to use population std
- Output: Square root of variance
4. median(data)
Compute the median (middle value).
- Input: A list of numbers
- Output: The middle value (or average of two middle values)
5. correlation(x, y)
Compute the Pearson correlation coefficient.
- Input: Two lists of numbers of equal length
- Output: A value between -1 and 1
Examples
mean([1, 2, 3, 4, 5]) # 3.0
variance([1, 2, 3, 4, 5]) # 2.0 (population)
std([1, 2, 3, 4, 5]) # ~1.41
median([1, 2, 3, 4, 5]) # 3
median([1, 2, 3, 4]) # 2.5
correlation([1, 2, 3], [1, 2, 3]) # 1.0 (perfect positive)
Notes
- Do not use NumPy or statistics library
- You may use
math.sqrt - Handle edge cases appropriately
Run tests to see results
No issues detected