Windowed Aggregation with Watermarks

hard · distributed-systems, streaming, windowing

Windowed Aggregation with Watermarks

You are building a streaming aggregator with tumbling windows. Events arrive in order of arrival, but their event times may be out of order.

Definitions:

  • Window size = W (ms). Window start = (t / W) * W, end = start + W.
  • maxEventTime = max event time seen so far.
  • watermark = maxEventTime - allowedLateness.
  • An event with time < watermark is late and should be dropped.
  • Emit a window when windowEnd <= watermark.

Return the list of emitted windows in the order they are emitted. Each window contains its Start time and the Sum of values.

Types

type Event struct {
    Time  int
    Value int
}

type Window struct {
    Start int
    Sum   int
}

Function signature

func TumblingWindowSums(events []Event, windowSize int, allowedLateness int) []Window

Example

windowSize = 10
allowedLateness = 5
Events: (t=12,v=1), (t=4,v=2), (t=18,v=3)

maxEventTime after first = 12, watermark = 7 -> emit window [0,10)? no (end 10 > 7)
Event t=4 is late (<= 7) -> drop
maxEventTime after third = 18, watermark = 13 -> emit window [10,20) ? end 20 > 13
No windows emitted

Constraints

  • 0 <= len(events) <= 100000
  • 1 <= windowSize <= 1_000_000
  • 0 <= allowedLateness <= 1_000_000
  • 0 <= Time <= 1_000_000

Notes

  • Do not emit windows whose end is greater than the final watermark.
Run tests to see results
No issues detected