Find Median from Data Stream

hard · heap, data-structure

Find Median from Data Stream

Design a data structure that supports adding numbers from a stream and finding the median at any time.

Function signature

type MedianFinder struct { /* fields */ }

func Constructor() MedianFinder
func (m *MedianFinder) AddNum(num int)
func (m *MedianFinder) FindMedian() float64

Inputs / outputs

  • AddNum: inserts a number into the data structure.
  • FindMedian: returns the current median.

Example

AddNum(1), AddNum(2) -> FindMedian() = 1.5
AddNum(3) -> FindMedian() = 2

Constraints

  • 1 <= number of operations <= 100_000

Core idea (near-solution)

Maintain two heaps: a max-heap for the lower half and a min-heap for the upper half. Balance them so their sizes differ by at most one.

Invariant: all values in low <= all values in high, and len(low) >= len(high).

Algorithm (step-by-step)

  1. AddNum(x):
    • If low is empty or x <= max(low), push into low; else push into high.
    • Rebalance if len(low) > len(high)+1 or len(high) > len(low).
  2. FindMedian():
    • If sizes equal, return (max(low) + min(high)) / 2.0.
    • Otherwise return max(low).

Correctness sketch

  • Invariant: all values in low <= all values in high, and len(low) >= len(high).
  • The max-heap holds the lower half and the min-heap holds the upper half.
  • Rebalancing maintains size difference at most one and order between heaps.
  • The median is therefore either the top of the max-heap or the average of both tops.

Trace (hand walkthrough)

Trace Walkthrough
Example input:
AddNum(1), AddNum(2) -> FindMedian() = 1.5
AddNum(3) -> FindMedian() = 2

Walk:
- Add 1 -> low=[1], high=[], median=1
- Add 2 -> low=[1], high=[2], median=(1+2)/2=1.5
- Add 3 -> rebalance low=[2,1], high=[3], median=2

Pitfalls to avoid

  • Failing to rebalance after insertion.
  • Returning the wrong heap's top when sizes differ.
  • Using a min-heap for low by mistake.

Complexity

  • Time per AddNum: O(log n).
  • Time per FindMedian: O(1).
  • Extra space: O(n).

Implementation notes (Go)

  • Use container/heap with a custom max-heap for low and min-heap for high.
  • Store ints; convert to float64 in FindMedian.
Run tests to see results
No issues detected