Compute statistics in data stream
Company: Akuna Capital
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of streaming data structures and statistical aggregation, specifically maintaining max, mean, and mode for an unbounded integer stream.
Streaming Statistics: Max, Mean, and Mode (Unbounded)
Constraints
- 1 <= x <= 1001 for every added integer
- Queries on an empty stream return None
- Mode ties are broken by returning the smallest value
- The stream is unbounded; the design must not grow with the number of elements
Examples
Input: ([['add', 3], ['add', 7], ['add', 3], ['max'], ['mean'], ['mode']],)
Expected Output: [7, 4.333333333333333, 3]
Explanation: Stream is [3,7,3]. max=7, mean=13/3=4.333..., mode=3 (appears twice).
Input: ([['max'], ['mean'], ['mode']],)
Expected Output: [None, None, None]
Explanation: Empty stream: every query returns None.
Input: ([['add', 5], ['max'], ['mean'], ['mode']],)
Expected Output: [5, 5.0, 5]
Explanation: Single element 5: max=mean=mode=5.
Input: ([['add', 2], ['add', 4], ['mode'], ['add', 4], ['add', 2], ['mode']],)
Expected Output: [2, 2]
Explanation: After [2,4] each value has count 1 -> tie broken to smallest (2). After [2,4,4,2] both have count 2 -> tie again broken to smallest (2).
Input: ([['add', 1001], ['add', 1], ['add', 1001], ['max'], ['mean'], ['mode']],)
Expected Output: [1001, 667.6666666666666, 1001]
Explanation: Boundary values. Stream [1001,1,1001]: max=1001, mean=2003/3=667.666..., mode=1001.
Input: ([['add', 6], ['add', 6], ['add', 6], ['add', 2], ['max'], ['mean'], ['mode']],)
Expected Output: [6, 5.0, 6]
Explanation: Stream [6,6,6,2]: max=6, mean=20/4=5.0, mode=6 (count 3).
Hints
- Because values are bounded to 1..1001, a frequency array of size 1002 gives O(1) space no matter how long the stream is.
- Maintain max, sum, and count incrementally on each add so queries are O(1).
- Track the current best (value, frequency) on each add to answer mode in O(1); update best when an incoming value's new count beats it, or ties it with a smaller value.
Streaming Statistics over a Sliding Window of k Elements
Constraints
- 1 <= k
- 1 <= x <= 1001 for every added integer
- Adding to a full window evicts the oldest element first (FIFO)
- Queries on an empty window return None
- Mode ties are broken by returning the smallest value
Examples
Input: (3, [['add', 1], ['add', 2], ['add', 3], ['max'], ['mean'], ['mode']])
Expected Output: [3, 2.0, 1]
Explanation: Window [1,2,3] (k=3, none evicted yet): max=3, mean=6/3=2.0, mode=1 (all tied, smallest wins).
Input: (2, [['add', 5], ['add', 9], ['add', 4], ['max'], ['mean'], ['mode']])
Expected Output: [9, 6.5, 4]
Explanation: k=2: adding 4 evicts 5, window becomes [9,4]. max=9, mean=13/2=6.5, mode=4 (tie, smallest).
Input: (3, [['max'], ['mean'], ['mode']])
Expected Output: [None, None, None]
Explanation: Empty window: all queries return None.
Input: (1, [['add', 7], ['add', 2], ['max'], ['mean'], ['mode']])
Expected Output: [2, 2.0, 2]
Explanation: k=1: adding 2 evicts 7, window=[2]. All stats equal 2.
Input: (4, [['add', 8], ['add', 8], ['add', 3], ['add', 8], ['add', 1], ['max'], ['mean'], ['mode']])
Expected Output: [8, 5.0, 8]
Explanation: k=4: after adding 1, oldest 8 evicted -> window [8,3,8,1]. max=8, mean=20/4=5.0, mode=8 (count 2).
Input: (2, [['add', 1001], ['add', 1001], ['add', 1], ['max'], ['mean'], ['mode']])
Expected Output: [1001, 501.0, 1]
Explanation: k=2: adding 1 evicts the first 1001 -> window [1001,1]. max=1001, mean=1002/2=501.0, mode=1 (tie, smallest).
Input: (3, [['add', 4], ['add', 6], ['add', 6], ['add', 4], ['add', 4], ['mode'], ['max'], ['mean']])
Expected Output: [4, 6, 4.666666666666667]
Explanation: k=3: after all adds the window is the last 3 = [6,4,4]. mode=4 (count 2), max=6, mean=14/3=4.666...
Hints
- Keep a FIFO deque of the window's elements so you know which value to evict, and a counts array of size 1002 for the value frequencies inside the window.
- On eviction, decrement the evicted value's count and subtract it from the running sum; on add, increment and add.
- Because values are bounded, recompute max by scanning the counts array downward from 1001, and mode by scanning it upward (first value with the highest count handles the smallest-on-tie rule). These scans are O(1) in k.