Extract stream prefix before stop token
Company: Microsoft
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
You are consuming a text stream delivered as an iterator/generator of **string chunks** (each chunk may be any length, including empty). You are also given a **stop token** string `token`.
Concatenate the chunks conceptually into one long string `S`. Your task is to return the **prefix of `S` that appears strictly before the first occurrence of `token`**.
Key requirements:
- The stop token may be **split across chunk boundaries** (e.g., `"ab"` in one chunk and `"c"` in the next forms token `"abc"`).
- You must be able to **return as soon as the first full token is detected**. You are **not allowed** to buffer the entire stream and then search.
- Aim for low extra memory usage (e.g., proportional to `len(token)` rather than total stream size).
If the token never appears, return the entire concatenated stream content (up to end-of-stream).
### Function signature (example)
- Input: `chunks: Iterator[str]`, `token: str`
- Output: `str`
### Examples
1) `chunks = ["hello ", "wor", "ld<END>", "ignored"]`, `token = "<END>"` → return `"hello world"`
2) `chunks = ["abc", "d", "ef"]`, `token = "cde"` (token crosses boundary) → return `"ab"`
3) `chunks = ["aaaaa"]`, `token = "b"` → return `"aaaaa"`
### Constraints (reasonable interview assumptions)
- `1 <= len(token) <= 10^5`
- Total streamed characters can be large (potentially millions), so the solution should be close to linear time in the number of consumed characters.
Implement the function to satisfy the requirements above.
Quick Answer: This question evaluates streaming string processing, incremental substring detection across chunk boundaries, and memory-efficient algorithm design for low-latency consumption of potentially large text streams.
Given text chunks and a stop token, return the stream prefix before the first stop-token occurrence, including when the token spans chunks.
Constraints
- Inputs are provided as Python literals compatible with the function signature.
- Return a deterministic value exactly matching the requested output.
Examples
Input: (['abc', 'deSTOPfg'], 'STOP')
Expected Output: 'abcde'
Explanation: Inside one chunk.
Input: (['hello ST', 'OP world'], 'STOP')
Expected Output: 'hello '
Explanation: Across chunks.
Input: (['no', ' marker'], '<end>')
Expected Output: 'no marker'
Explanation: Absent stop token.
Input: (['abc'], '')
Expected Output: ''
Explanation: Empty stop token.
Hints
- Start with a direct data structure representation.
- Handle edge cases before the main loop.