PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.

  • medium
  • Microsoft
  • Coding & Algorithms
  • Machine Learning Engineer

Detect stop tokens during streaming inference

Company: Microsoft

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Stop-token / stop-sequence detection in streaming generation During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more **stop sequences**. ### Input - A stream/iterator of generated token IDs (or strings). - A list of stop sequences, where each stop sequence can be: - a single token ID, or - a list of token IDs representing a multi-token sequence (e.g., `[A, B, C]`). ### Output / behavior - As tokens arrive, emit generated tokens **up to but not including** the first occurrence of any stop sequence. - Stop as soon as any stop sequence is detected. ### Requirements - Must work for overlapping matches (e.g., stop sequences `[1,2,1]` and stream `...1,2,1`). - Must handle cases where a partial stop sequence appears at the end of the current buffer and completes with future tokens. - Efficient: do not rescan the entire history per token. ### Clarifications - If multiple stop sequences could match ending at the same position, stopping is immediate regardless of which matched. - If the stream ends without a stop sequence, return all tokens. Describe your approach and complexity; implement in your chosen language.

Quick Answer: This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.

During streaming generation, tokens arrive one by one. Implement a function that returns exactly what should be emitted before generation must stop.\n\nYou are given:\n- stream: a list of generated token IDs\n- stop_sequences: a list where each element is either a single integer token or a list of integers\n\nProcess the stream from left to right as if tokens were arriving online. As soon as any stop sequence is fully matched, stop immediately and return all tokens before that matched stop sequence. The stop sequence itself must not appear in the result.\n\nYour solution must:\n- handle overlapping stop sequences\n- keep partial matches that may complete with future tokens\n- avoid rescanning the full history for every new token\n\nIf the stream ends before any stop sequence appears, return the entire stream. If stop_sequences is empty, return the whole stream.

Constraints

  • 0 <= len(stream) <= 200000
  • 0 <= len(stop_sequences) <= 200000
  • The sum of lengths of all normalized stop sequences is at most 200000
  • Each non-empty stop sequence has length at least 1
  • Token IDs are integers

Examples

Input: ([5, 7, 8, 9, 10], [[8, 9]])

Expected Output: [5, 7]

Explanation: The first stop sequence [8, 9] starts at index 2, so only the tokens before it are returned.

Input: ([4, 1, 2, 1, 3], [[1, 2, 1]])

Expected Output: [4]

Explanation: The stop sequence overlaps with its own prefix and suffix, but it is still detected as soon as the final 1 arrives.

Input: ([1, 2, 4], [[1, 2, 3]])

Expected Output: [1, 2, 4]

Explanation: The prefix [1, 2] is only a partial match. Since the stream never produces 3, nothing is removed.

Input: ([6, 7, 8], [7])

Expected Output: [6]

Explanation: A stop sequence can be given as a single integer. Generation stops immediately when 7 arrives.

Input: ([5, 6, 7, 8], [[7, 8], 6])

Expected Output: [5]

Explanation: The single-token stop 6 is detected before the longer stop [7, 8] can complete.

Input: ([], [[1, 2]])

Expected Output: []

Explanation: With no generated tokens, the result is empty.

Input: ([2, 3, 4], [])

Expected Output: [2, 3, 4]

Explanation: Without any stop sequences, the entire stream is returned.

Hints

  1. A trie lets you match many stop sequences at once instead of checking every stop sequence against every suffix of the stream.
  2. You only need to keep the last max_stop_length - 1 un-emitted tokens in a pending buffer. Anything older can no longer be part of a future stop sequence.
Last updated: May 16, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Return Top K Open Businesses - Microsoft (hard)
  • Implement Memory Allocation and In-Memory Records - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Sort Three Categories In Place - Microsoft (medium)
  • Retain Top K Elements - Microsoft (medium)