Implement Data Structure for Top-K Elements in Streams
Company: Pinterest
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Scenario
Analytics feature that must constantly report the K largest numbers seen so far.
##### Question
Implement a data structure that ingests a stream of integers and returns the top-K elements at any moment. How do you extend it to very large K or distributed streams?
##### Hints
Maintain a min-heap of size K; for distribution, merge local heaps or use count-min sketches.
Quick Answer: This question evaluates competency in streaming algorithms, dynamic data structures for maintaining top-K elements in real time, and scalability considerations for handling large or distributed input streams.
Design a function that maintains the K largest integers seen from a stream. You are given an integer k and a list of operations. Each operation is either ['add', x] to insert integer x into the stream, or ['get'] to query the current top-k numbers. For every 'get' operation, return a list of up to k largest numbers seen so far in descending (non-increasing) order. Duplicates are allowed and should appear as separate entries if present. If k = 0, every 'get' returns an empty list. Implement the function to process all operations and return the results of the 'get' operations in order.
Constraints
- 1 <= len(operations) <= 200000
- -10^9 <= x <= 10^9 for any added value
- 0 <= k <= 100000
- operations[i] is either ['add', x] or ['get']
- Return one list per 'get' operation, each sorted in descending order
- Use O(k) additional space
Hints
- Maintain a min-heap of size at most K: push until size K; thereafter, replace the heap minimum only if the new value is larger.
- For 'get', copy and sort the heap in descending order to return the current top-K.
- For very large K or distributed streams, keep local top-K min-heaps on shards and merge them with a min-heap or k-way merge; for approximate heavy hitters, consider Count-Min Sketch.