Design a data structure for dynamic top‑K frequency
Company: Bloomberg
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates data structure and algorithm design skills, focusing on frequency counting, dynamic maintenance of top-K elements, and efficient update and query operations in a multiset within the Coding & Algorithms domain.
Constraints
- 1 <= len(operations) == len(values) <= 200000
- operations[i] is one of "insert", "remove", or "topK"
- -10^9 <= values[i] <= 10^9 for insert/remove operations
- 1 <= k <= len(operations) for topK operations
- Tie-breaking for equal frequencies must be deterministic: smaller value first
Examples
Input: (["insert", "insert", "insert", "insert", "insert", "topK", "remove", "topK"], [5, 7, 5, 7, 5, 2, 5, 2])
Expected Output: [[5, 7], [5, 7]]
Explanation: After five inserts, frequencies are {5: 3, 7: 2}, so top 2 are [5, 7]. After removing one 5, frequencies become {5: 2, 7: 2}; tie is broken by smaller value first, so [5, 7].
Input: (["insert", "insert", "insert", "remove", "topK", "remove", "topK"], [-1, -2, -1, -3, 3, -1, 3])
Expected Output: [[-1, -2], [-2, -1]]
Explanation: Removing -3 does nothing because it is absent. First query sees {-1: 2, -2: 1}. After removing one -1, both values have frequency 1, so smaller value -2 comes first.
Input: (["topK", "insert", "remove", "topK"], [5, 10, 10, 1])
Expected Output: [[], []]
Explanation: The structure is empty for the first query. After inserting and then removing 10, it is empty again for the last query.
Input: (["insert", "insert", "insert", "insert", "topK", "remove", "remove", "topK", "insert", "topK"], [1, 2, 2, 3, 3, 2, 2, 3, 1, 3])
Expected Output: [[2, 1, 3], [1, 3], [1, 3]]
Explanation: Initially frequencies are {1: 1, 2: 2, 3: 1}, so [2, 1, 3]. Removing 2 twice eliminates it, leaving {1: 1, 3: 1}, so [1, 3]. Inserting 1 again gives {1: 2, 3: 1}, so [1, 3].
Input: (["insert", "insert", "insert", "insert", "insert", "topK", "topK", "remove", "topK", "insert", "topK"], [4, 4, 6, 6, 5, 2, 3, 4, 2, 5, 3])
Expected Output: [[4, 6], [4, 6, 5], [6, 4], [5, 6, 4]]
Explanation: Repeated topK queries must preserve the structure. Ties are always resolved by smaller value first, which determines [4, 6] and later [5, 6, 4].
Hints
- Use a hash map to store the current frequency of each value. Then think about a data structure that can quickly expose the highest-frequency candidates.
- A max-heap with lazy deletion works well: push updated snapshots into the heap, and when answering `topK`, discard heap entries that no longer match the current frequency.