Detect and remove matched words in a char stream
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of online stream processing, substring matching, and the use of efficient data structures and algorithms for real-time updates and deletions.
Constraints
- 0 <= len(words) <= 10^4
- 0 <= sum(len(word) for word in words) <= 10^5
- 0 <= len(stream) <= 10^5
- Each word is non-empty and consists only of lowercase English letters
- stream consists only of lowercase English letters
Examples
Input: (['ab', 'bc'], 'zabcbc')
Expected Output: (['', '', 'ab', '', '', 'bc'], 'zc')
Explanation: After the 3rd character, the buffer is 'zab', so 'ab' is removed. Later, after the 6th character, the buffer is 'zcbc', so 'bc' is removed, leaving 'zc'.
Input: (['ba', 'cba'], 'xcba')
Expected Output: (['', '', '', 'cba'], 'x')
Explanation: At the last step, both 'ba' and 'cba' are suffixes of 'xcba'. The longest suffix match is 'cba', so it is removed.
Input: (['a', 'aa'], '')
Expected Output: ([], '')
Explanation: An empty stream produces no reports, and the final buffer stays empty.
Input: ([], 'abc')
Expected Output: (['', '', ''], 'abc')
Explanation: With no target words, nothing can ever be removed.
Input: (['abc', 'bc'], 'abcbc')
Expected Output: (['', '', 'abc', '', 'bc'], '')
Explanation: At the 3rd character, 'abc' and 'bc' both match as suffixes, so the longer word 'abc' is removed. Later, 'bc' is removed at the end.
Hints
- Because the buffer is fully cleaned after each character, any new match created by appending a character must end at that character. So you only need to detect matching suffixes of the current buffer.
- Use a trie with failure links (Aho-Corasick). If you also keep a stack of automaton states for the current buffer, you can remove a matched suffix by popping characters and states.