Build a next-word frequency predictor
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
Quick Answer: This question evaluates understanding of data structures and algorithms for frequency counting and associative lookups, including preprocessing-versus-query trade-offs, tie-breaking, and handling edge cases in sequential data.
Constraints
- 0 <= len(tokens), len(queries) <= 200000
- Each token and query is a string
- If multiple next words have the same highest frequency, return the lexicographically smallest one
- Your approach should preprocess once and answer each query in O(1) average time afterward
Examples
Input: (['i', 'am', 'sam', 'i', 'am', 'happy', 'i', 'am', 'sam'], ['i', 'am', 'sam', 'happy', 'unknown'])
Expected Output: ['am', 'sam', 'i', 'i', '']
Explanation: i is followed by am 3 times. am is followed by sam 2 times and happy 1 time, so sam wins. sam is followed by i once. happy is followed by i once. unknown has no following word.
Input: (['a', 'c', 'a', 'b', 'a', 'c', 'a', 'b'], ['a', 'b', 'c'])
Expected Output: ['b', 'a', 'a']
Explanation: For a, both b and c follow it 2 times, so the lexicographically smaller word b is returned. b is followed by a once. c is followed by a twice.
Input: (['solo'], ['solo', 'x'])
Expected Output: ['', '']
Explanation: There are no adjacent pairs in a single-word corpus, so no query has a valid prediction.
Input: ([], ['anything'])
Expected Output: ['']
Explanation: An empty corpus has no next-word information.
Input: (['go', 'home', 'go', 'home', 'stay'], ['home', 'stay', 'go'])
Expected Output: ['go', '', 'home']
Explanation: home is followed by go once and stay once, so go is chosen by lexicographic tie-break. stay has no following word. go is followed by home twice.
Hints
- Use a hash map from each word to another hash map that stores counts of words that follow it.
- To make queries fast, keep track of the current best next word for each word while building the counts.