PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Microsoft

Detect stop tokens during streaming inference

Last updated: Mar 29, 2026

Quick Overview

This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.

  • medium
  • Microsoft
  • Coding & Algorithms
  • Machine Learning Engineer

Detect stop tokens during streaming inference

Company: Microsoft

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Stop-token / stop-sequence detection in streaming generation During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more **stop sequences**. ### Input - A stream/iterator of generated token IDs (or strings). - A list of stop sequences, where each stop sequence can be: - a single token ID, or - a list of token IDs representing a multi-token sequence (e.g., `[A, B, C]`). ### Output / behavior - As tokens arrive, emit generated tokens **up to but not including** the first occurrence of any stop sequence. - Stop as soon as any stop sequence is detected. ### Requirements - Must work for overlapping matches (e.g., stop sequences `[1,2,1]` and stream `...1,2,1`). - Must handle cases where a partial stop sequence appears at the end of the current buffer and completes with future tokens. - Efficient: do not rescan the entire history per token. ### Clarifications - If multiple stop sequences could match ending at the same position, stopping is immediate regardless of which matched. - If the stream ends without a stop sequence, return all tokens. Describe your approach and complexity; implement in your chosen language.

Quick Answer: This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.

Related Interview Questions

  • Sort Three Categories In Place - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Implement SFT Sample Packing - Microsoft (medium)
  • Implement SQL Table and DNA Ordering - Microsoft (medium)
  • Solve power jumps and graph tour - Microsoft (hard)
Microsoft logo
Microsoft
Feb 11, 2026, 12:00 AM
Machine Learning Engineer
Onsite
Coding & Algorithms
2
0
Loading...

Problem: Stop-token / stop-sequence detection in streaming generation

During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more stop sequences.

Input

  • A stream/iterator of generated token IDs (or strings).
  • A list of stop sequences, where each stop sequence can be:
    • a single token ID, or
    • a list of token IDs representing a multi-token sequence (e.g., [A, B, C] ).

Output / behavior

  • As tokens arrive, emit generated tokens up to but not including the first occurrence of any stop sequence.
  • Stop as soon as any stop sequence is detected.

Requirements

  • Must work for overlapping matches (e.g., stop sequences [1,2,1] and stream ...1,2,1 ).
  • Must handle cases where a partial stop sequence appears at the end of the current buffer and completes with future tokens.
  • Efficient: do not rescan the entire history per token.

Clarifications

  • If multiple stop sequences could match ending at the same position, stopping is immediate regardless of which matched.
  • If the stream ends without a stop sequence, return all tokens.

Describe your approach and complexity; implement in your chosen language.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Microsoft•More Machine Learning Engineer•Microsoft Machine Learning Engineer•Microsoft Coding & Algorithms•Machine Learning Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.