Streaming Median Tracker
Company: Trexquant
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to design a data structure that maintains a running median over a stream of numbers, testing knowledge of heaps and balanced data partitioning. It is a classic coding and algorithms interview problem used to assess understanding of amortized time complexity and efficient online statistics tracking, requiring practical implementation rather than just conceptual discussion.
Constraints
- The first operation is always "MedianTracker"; every subsequent op is "add" or "median".
- Values passed to add are 32-bit signed integers; they may be negative and may repeat.
- add can be called up to 10^5 times, interleaved arbitrarily with median calls.
- median is never called before the first add.
- When the running count is even, return the exact average of the two middle elements (may be non-integer, e.g. 2.5).
Examples
Input: (["MedianTracker", "add", "median", "add", "median", "add", "median", "add", "median"], [[], [1], [], [3], [], [2], [], [10], []])
Expected Output: [1.0, 2.0, 2.0, 2.5]
Explanation: Running medians after each add: [1]->1.0, [1,3]->avg(1,3)=2.0, [1,2,3]->2.0, [1,2,3,10]->avg(2,3)=2.5. Matches the prompt example.
Input: (["MedianTracker", "add", "median"], [[], [5], []])
Expected Output: [5.0]
Explanation: Single element: the median is that element as a float.
Input: (["MedianTracker", "add", "add", "add", "median", "add", "median"], [[], [-1], [-1], [-1], [], [-5], []])
Expected Output: [-1.0, -1.0]
Explanation: All-duplicate negatives: [-1,-1,-1]->-1.0; after adding -5 -> [-5,-1,-1,-1], the two middle values are both -1 so the average is -1.0.
Input: (["MedianTracker", "add", "add", "median"], [[], [1], [2], []])
Expected Output: [1.5]
Explanation: Even count with a non-integer median: avg(1, 2) = 1.5.
Input: (["MedianTracker", "add", "add", "add", "add", "add", "median"], [[], [6], [10], [2], [6], [5], []])
Expected Output: [6.0]
Explanation: Sorted so far: [2,5,6,6,10]; odd count of 5, the single middle value is 6.0. Confirms unsorted insertion order and duplicates are handled.
Input: (["MedianTracker", "add", "median", "add", "median"], [[], [-10], [], [10], []])
Expected Output: [-10.0, 0.0]
Explanation: Negatives with a running median: [-10]->-10.0; after adding 10 -> [-10,10], avg(-10,10)=0.0.
Hints
- Keep the values split into two halves: a max-heap holding the smaller half and a min-heap holding the larger half. Maintain the invariant that the max-heap has the same size as the min-heap, or exactly one more element.
- Python's heapq is a min-heap; simulate a max-heap by pushing negated values.
- On each add, push then rebalance so |len(small) - len(large)| <= 1. The median is the top of the larger heap when sizes differ, otherwise the average of the two tops.
- Return a float even when the median is a whole number (e.g. 1.0), and use true division so even counts can yield values like 2.5.