Design top-K frequency structure
Company: Uber
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates ability to design efficient in-memory data structures for frequency tracking, including handling recency-based tie-breaking, memory bounds under churn, and concurrency implications.
Constraints
- 1 <= number of operations <= 10^5
- Items are strings; k is a non-negative integer
- A 'dec' on an item not currently present is a no-op
- An item is deleted exactly when its frequency reaches 0
- Tie-break among equal frequencies is by most-recent update (recency descending)
- topK with k greater than the number of live items returns all live items
Examples
Input: ([['add', 'a'], ['add', 'b'], ['inc', 'a'], ['topK', 2]],)
Expected Output: [['a', 'b']]
Explanation: a is incremented to frequency 2, b stays at 1, so a outranks b.
Input: ([['inc', 'x'], ['inc', 'y'], ['inc', 'y'], ['inc', 'z'], ['inc', 'z'], ['topK', 1]],)
Expected Output: [['z']]
Explanation: x=1, y=2, z=2; y and z tie at frequency 2, but z was updated most recently, so topK(1) returns z.
Input: ([['add', 'a'], ['add', 'b'], ['topK', 2]],)
Expected Output: [['b', 'a']]
Explanation: a and b both have frequency 1; b was added more recently, so the recency tie-break puts b first.
Input: ([['inc', 'a'], ['inc', 'a'], ['dec', 'a'], ['dec', 'a'], ['topK', 3]],)
Expected Output: [[]]
Explanation: a rises to 2 then is decremented back to 0 and deleted, leaving the structure empty.
Input: ([['inc', 'p'], ['inc', 'q'], ['inc', 'p'], ['inc', 'q'], ['inc', 'q'], ['dec', 'q'], ['topK', 2]],)
Expected Output: [['q', 'p']]
Explanation: p=2 and q ends at 2 after the dec; the dec is a non-deleting update that refreshes q's recency, making q more recent than p, so q comes first.
Input: ([['topK', 5]],)
Expected Output: [[]]
Explanation: Querying an empty structure returns an empty list even though k=5.
Input: ([['add', 'a'], ['add', 'a'], ['add', 'b'], ['add', 'c'], ['add', 'b'], ['topK', 10]],)
Expected Output: [['b', 'a', 'c']]
Explanation: Frequencies a=2, b=2, c=1; a and b tie at 2 but b was updated last, so order is b, a, then c. k=10 exceeds the 3 live items so all are returned.
Hints
- Track two maps: item -> frequency and item -> a monotonically increasing recency stamp. Bump the recency stamp on every add/inc and on every dec that does not delete the item.
- For O(1) amortized updates, keep a doubly linked list of frequency buckets; each bucket holds the items at that frequency in recency order, so inc/dec moves an item to an adjacent bucket in constant time.
- When sorting for topK, the key is (frequency descending, recency descending). The optimal retrieval walks buckets from the highest frequency downward, taking items front-to-back until k are collected.