Implement Streaming Word Counter
Company: Netflix
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates streaming data processing, string tokenization, and frequency-aggregation competencies, focusing on maintaining accurate counts over incremental text inputs using associative data structures.
Constraints
- 1 <= len(operations) == len(values) <= 10^4
- 0 <= total number of words across all 'add_text' operations <= 10^5
- Words are separated by whitespace
- Word matching is case-sensitive
Examples
Input: (['add_text', 'get_count', 'get_count', 'get_count'], ['hello world hello', 'hello', 'world', 'missing'])
Expected Output: [None, 2, 1, 0]
Explanation: After adding the text, 'hello' appears 2 times, 'world' appears 1 time, and 'missing' has not appeared.
Input: (['add_text', 'add_text', 'get_counts', 'get_count', 'get_counts'], ['a b a', 'b c', None, 'c', None])
Expected Output: [None, None, {'a': 2, 'b': 2, 'c': 1}, 1, {'a': 2, 'b': 2, 'c': 1}]
Explanation: The second add_text call increases 'b' and adds 'c'. get_counts should return the full current mapping each time.
Input: (['add_text', 'add_text', 'get_count', 'get_count', 'get_counts'], [' ', 'Hi hi HI ', 'Hi', 'hi', None])
Expected Output: [None, None, 1, 1, {'Hi': 1, 'hi': 1, 'HI': 1}]
Explanation: Whitespace-only text adds nothing, and matching is case-sensitive, so 'Hi', 'hi', and 'HI' are counted separately.
Input: (['get_counts', 'get_count', 'add_text', 'get_counts'], [None, 'anything', '', None])
Expected Output: [{}, 0, None, {}]
Explanation: Before any text is added, the counter is empty. Adding an empty string does not change the counts.
Hints
- Use a dictionary to store the running frequency for each word.
- In Python, text.split() with no separator automatically handles multiple spaces and ignores empty tokens.