This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.
During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more stop sequences.
[A, B, C]
).
[1,2,1]
and stream
...1,2,1
).
Describe your approach and complexity; implement in your chosen language.