Predict most likely next word from training data
Company: Bloomberg
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency in frequency-based language modeling and data-structure design, focusing on token sequence analysis, frequency counting, and handling boundary conditions.
Constraints
- 0 <= number of sentences <= 10^4
- 0 <= words per sentence <= 10^4
- Words are non-empty lowercase/mixed-case ASCII strings
- Sentinel for 'no next word' is the empty string ""
- Ties resolved by lexicographically smallest next word
Examples
Input: ([["I", "am", "sam"], ["am", "sam"]], ["I", "am", "sam"])
Expected Output: ["am", "sam", ""]
Explanation: "I"->"am" (1), "am"->"sam" (2 total), "sam" never has a following word so it returns the sentinel.
Input: ([["the", "cat", "the", "dog", "the", "cat"]], ["the"])
Expected Output: ["cat"]
Explanation: "the" is followed by "cat" twice and "dog" once, so the most frequent next word is "cat".
Input: ([["a", "b"], ["a", "c"]], ["a"])
Expected Output: ["b"]
Explanation: "a"->"b" and "a"->"c" both have count 1; the tie is broken lexicographically, so "b" wins.
Input: ([], ["x"])
Expected Output: [""]
Explanation: Empty training data yields no transitions, so the query returns the empty-string sentinel.
Input: ([["hello"]], ["hello", "world"])
Expected Output: ["", ""]
Explanation: "hello" appears only at sentence end (no following word) and "world" never appears, so both return the sentinel.
Input: ([["x", "y", "z", "y", "w"], ["x", "y"]], ["x", "y", "z", "w"])
Expected Output: ["y", "w", "y", ""]
Explanation: "x"->"y" (2). "y"->"z" (1) and "y"->"w" (1) tie, lexicographic pick is "w". "z"->"y" (1). "w" is only at an end, so it returns the sentinel.
Hints
- Make one pass over every sentence and, for each adjacent pair (a, b), increment a nested count map: counts[a][b] += 1.
- To answer a query for word w, scan counts[w] and pick the entry with the highest count; on equal counts prefer the lexicographically smaller word.
- If w isn't a key in your map (or it appears only at sentence ends), there are no transitions — return the empty string "".