Problem: Stop-token / stop-sequence detection in streaming generation
During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more stop sequences.
Input
-
A stream/iterator of generated token IDs (or strings).
-
A list of stop sequences, where each stop sequence can be:
-
a single token ID, or
-
a list of token IDs representing a multi-token sequence (e.g.,
[A, B, C]
).
Output / behavior
-
As tokens arrive, emit generated tokens
up to but not including
the first occurrence of any stop sequence.
-
Stop as soon as any stop sequence is detected.
Requirements
-
Must work for overlapping matches (e.g., stop sequences
[1,2,1]
and stream
...1,2,1
).
-
Must handle cases where a partial stop sequence appears at the end of the current buffer and completes with future tokens.
-
Efficient: do not rescan the entire history per token.
Clarifications
-
If multiple stop sequences could match ending at the same position, stopping is immediate regardless of which matched.
-
If the stream ends without a stop sequence, return all tokens.
Describe your approach and complexity; implement in your chosen language.