Stream output until stop token appears
Company: Microsoft
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates streaming string processing and pattern-matching skills, specifically handling token detection across chunk boundaries, boundary conditions, and time/space efficiency. It is commonly asked in the Coding & Algorithms domain to assess practical application-level competence in stream-processing algorithms rather than purely theoretical understanding.
Constraints
- 0 <= len(chunks) <= 10^5
- 1 <= len(stopToken) <= 10^5
- 0 <= sum(len(chunk) for chunk in chunks) <= 2 * 10^5
- Chunks may contain empty strings
- Aim for O(total_characters + len(stopToken)) time and O(len(stopToken)) extra space
Examples
Input: (["hello", "how are you<END>", "HAPPY"], "<END>")
Expected Output: ["hello", "how are you"]
Explanation: The stop token appears completely inside the second chunk, so emit everything before it and ignore the rest.
Input: (["hello", "how are you<E", "ND>HAPPY"], "<END>")
Expected Output: ["hello", "how are you"]
Explanation: The token is split across chunk boundaries. The trailing "<E" from the second chunk must be buffered, not emitted.
Input: (["ab<E", "Xcd"], "<END>")
Expected Output: ["ab", "<EXcd"]
Explanation: After seeing "<E" you must wait, but the next character is "X", so the buffered text is no longer part of the stop token and becomes safe to emit.
Input: (["<", "END>", "later"], "<END>")
Expected Output: []
Explanation: The stream starts with the stop token, split across chunks, so nothing is emitted.
Input: (["abc", "#def", "ghi"], "#")
Expected Output: ["abc"]
Explanation: With a one-character stop token, emission stops immediately when that character is first seen.
Input: (["", "abc", "", "<EN", "D>", "tail"], "<END>")
Expected Output: ["abc"]
Explanation: Empty chunks should be handled correctly. The stop token appears later across two chunks, so only "abc" is emitted.
Input: (["abc<EN"], "<END>")
Expected Output: ["abc", "<EN"]
Explanation: The stream ends before the buffered partial match can become a full stop token, so the remaining buffered suffix must be flushed at end-of-stream.
Input: (["aaaa", "c"], "aaab")
Expected Output: ["a", "aaac"]
Explanation: Because the stop token overlaps with itself, only one "a" is safe after the first chunk. When "c" arrives, the buffered "aaa" can no longer form "aaab" and becomes safe.
Hints
- At any moment, the only text you must keep is the longest suffix of already-seen characters that could still be the beginning of `stopToken`.
- A prefix-function / KMP-style matcher lets you update that suffix length in amortized O(1) per character, even when the token overlaps with itself.