Detect stop tokens during streaming inference
Company: Microsoft
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.
Constraints
- 0 <= len(stream) <= 200000
- 0 <= len(stop_sequences) <= 200000
- The sum of lengths of all normalized stop sequences is at most 200000
- Each non-empty stop sequence has length at least 1
- Token IDs are integers
Examples
Input: ([5, 7, 8, 9, 10], [[8, 9]])
Expected Output: [5, 7]
Explanation: The first stop sequence [8, 9] starts at index 2, so only the tokens before it are returned.
Input: ([4, 1, 2, 1, 3], [[1, 2, 1]])
Expected Output: [4]
Explanation: The stop sequence overlaps with its own prefix and suffix, but it is still detected as soon as the final 1 arrives.
Input: ([1, 2, 4], [[1, 2, 3]])
Expected Output: [1, 2, 4]
Explanation: The prefix [1, 2] is only a partial match. Since the stream never produces 3, nothing is removed.
Input: ([6, 7, 8], [7])
Expected Output: [6]
Explanation: A stop sequence can be given as a single integer. Generation stops immediately when 7 arrives.
Input: ([5, 6, 7, 8], [[7, 8], 6])
Expected Output: [5]
Explanation: The single-token stop 6 is detected before the longer stop [7, 8] can complete.
Input: ([], [[1, 2]])
Expected Output: []
Explanation: With no generated tokens, the result is empty.
Input: ([2, 3, 4], [])
Expected Output: [2, 3, 4]
Explanation: Without any stop sequences, the entire stream is returned.
Hints
- A trie lets you match many stop sequences at once instead of checking every stop sequence against every suffix of the stream.
- You only need to keep the last max_stop_length - 1 un-emitted tokens in a pending buffer. Anything older can no longer be part of a future stop sequence.