Detect stop tokens during streaming inference

Q: Detect stop tokens during streaming inference

This is a Coding & Algorithms interview question from Microsoft for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Loading...

Problem: Stop-token / stop-sequence detection in streaming generation

During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more stop sequences.

Input

A stream/iterator of generated token IDs (or strings).
A list of stop sequences, where each stop sequence can be:
- a single token ID, or
- a list of token IDs representing a multi-token sequence (e.g., [A, B, C] ).

Output / behavior

As tokens arrive, emit generated tokens up to but not including the first occurrence of any stop sequence.
Stop as soon as any stop sequence is detected.

Requirements

Must work for overlapping matches (e.g., stop sequences [1,2,1] and stream ...1,2,1 ).
Must handle cases where a partial stop sequence appears at the end of the current buffer and completes with future tokens.
Efficient: do not rescan the entire history per token.

Clarifications

If multiple stop sequences could match ending at the same position, stopping is immediate regardless of which matched.
If the stream ends without a stop sequence, return all tokens.

Describe your approach and complexity; implement in your chosen language.