Find top/bottom-k words in list or stream
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in frequency counting, top-k and bottom-k selection, streaming algorithms, and data-structure design for maintaining order, tie-breaking, and dynamic updates.
Part 1: Top and Bottom K Frequent Words in a List
Constraints
- 0 <= len(words) <= 2 * 10^5
- 0 <= k <= 2 * 10^5
- Each word is a non-empty string
- Comparison is case-sensitive unless otherwise stated
Examples
Input: (['i', 'love', 'leetcode', 'i', 'love', 'coding'], 2)
Expected Output: [['i', 'love'], ['coding', 'leetcode']]
Explanation: `i` and `love` both appear twice, so lexicographical order gives `['i', 'love']`. `coding` and `leetcode` both appear once, so lexicographical order gives `['coding', 'leetcode']`.
Input: (['apple', 'banana', 'apple', 'cherry', 'banana', 'date'], 3)
Expected Output: [['apple', 'banana', 'cherry'], ['cherry', 'date', 'apple']]
Explanation: Frequencies are: apple=2, banana=2, cherry=1, date=1. Top 3 are ordered by frequency descending, then lexicographically: `apple`, `banana`, `cherry`. Bottom 3 are ordered by frequency ascending, then lexicographically: `cherry`, `date`, `apple`.
Input: (['z', 'y', 'x', 'x', 'y'], 10)
Expected Output: [['x', 'y', 'z'], ['z', 'x', 'y']]
Explanation: There are only 3 distinct words, so all of them are returned. Top order is by highest frequency first, bottom order is by lowest frequency first.
Input: ([], 3)
Expected Output: [[], []]
Explanation: With no words, both the most frequent and least frequent lists are empty.
Input: (['b', 'a', 'c'], 2)
Expected Output: [['a', 'b'], ['a', 'b']]
Explanation: All words appear once, so both top and bottom orderings are decided entirely by lexicographical order.
Input: (['same', 'same', 'same'], 0)
Expected Output: [[], []]
Explanation: When `k` is 0, both requested lists should be empty.
Hints
- Count how many times each distinct word appears before deciding the answer.
- You can sort the distinct words twice: once by (-frequency, word) and once by (frequency, word).
Part 2: Online Top/Bottom K Words from a Stream
Constraints
- 1 <= len(operations) <= 2 * 10^5
- Each operation is one of ['ingest', word], ['topK', str(k)], or ['bottomK', str(k)]
- 0 <= k <= 2 * 10^5
- Queries may appear before any word is ingested
Examples
Input: [['ingest', 'apple'], ['ingest', 'banana'], ['ingest', 'apple'], ['topK', '2'], ['bottomK', '2']]
Expected Output: [['apple', 'banana'], ['banana', 'apple']]
Explanation: After the ingests, counts are apple=2 and banana=1. topK(2) returns the most frequent first, while bottomK(2) returns the least frequent first.
Input: [['topK', '3'], ['ingest', 'dog'], ['ingest', 'cat'], ['ingest', 'cat'], ['topK', '2'], ['bottomK', '2'], ['topK', '5']]
Expected Output: [[], ['cat', 'dog'], ['dog', 'cat'], ['cat', 'dog']]
Explanation: The first query happens before any words are seen, so it returns []. Later, cat has frequency 2 and dog has frequency 1.
Input: [['ingest', 'banana'], ['ingest', 'apple'], ['ingest', 'cherry'], ['topK', '5'], ['bottomK', '2']]
Expected Output: [['apple', 'banana', 'cherry'], ['apple', 'banana']]
Explanation: All three words have frequency 1, so ties are broken lexicographically: apple, banana, cherry.
Input: [['ingest', 'zebra'], ['ingest', 'zebra'], ['ingest', 'ant'], ['topK', '0'], ['bottomK', '1']]
Expected Output: [[], ['ant']]
Explanation: A query with k=0 returns an empty list. For bottomK(1), ant has frequency 1 and zebra has frequency 2, so ant is returned.
Hints
- Keep an exact frequency map for all words seen so far.
- A max-heap and a min-heap with lazy deletion let you support online updates without fully resorting all words on every query.
Part 3: First/Last K Non-Repeating Words in a Stream
Constraints
- 1 <= len(operations) <= 2 * 10^5
- Each operation is one of ['ingest', word], ['firstK', str(k)], or ['lastK', str(k)]
- 0 <= k <= 2 * 10^5
- Queries may appear before any word is ingested
Examples
Input: ([['ingest', 'a'], ['ingest', 'b'], ['ingest', 'a'], ['ingest', 'b'], ['ingest', 'c'], ['ingest', 'd'], ['firstK', '2'], ['ingest', 'c'], ['ingest', 'e'], ['lastK', '2']],)
Expected Output: [['c', 'd'], ['d', 'e']]
Explanation: After the first six ingests, only 'c' and 'd' are non-repeating, so firstK 2 returns ['c', 'd']. After ingesting another 'c' and then 'e', the non-repeating words are ['d', 'e'], so lastK 2 returns them in arrival order.
Input: ([['firstK', '3'], ['ingest', 'x'], ['ingest', 'x'], ['firstK', '1'], ['lastK', '2']],)
Expected Output: [[], [], []]
Explanation: The first query happens before any ingest, so it returns []. After 'x' is ingested twice, it is repeating, so there are no non-repeating words for the remaining queries.
Input: ([['ingest', 'red'], ['ingest', 'blue'], ['ingest', 'green'], ['lastK', '2'], ['ingest', 'blue'], ['firstK', '5'], ['lastK', '5']],)
Expected Output: [['blue', 'green'], ['red', 'green'], ['red', 'green']]
Explanation: Initially the non-repeating order is ['red', 'blue', 'green'], so lastK 2 is ['blue', 'green']. After 'blue' repeats, the remaining non-repeating words are ['red', 'green']; both later queries return those two words.
Input: ([['ingest', 'x'], ['ingest', 'y'], ['ingest', 'x'], ['ingest', 'x'], ['ingest', 'z'], ['firstK', '3'], ['ingest', 'y'], ['lastK', '2']],)
Expected Output: [['y', 'z'], ['z']]
Explanation: 'x' appears three times and is not non-repeating. Before the last two operations, the non-repeating words are ['y', 'z'], so firstK 3 returns both. After 'y' repeats, only 'z' remains.
Hints
- When a word appears for the first time, it becomes non-repeating. When it appears for the second time, it must be removed from the non-repeating order.
- A doubly linked list plus a hash map from word to node supports O(1) insertion and removal while preserving arrival order.