How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Microsoft.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Microsoft during technical interviews.

Detect stop tokens during streaming inference

Quick Overview

This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.

Company: Microsoft

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Stop-token / stop-sequence detection in streaming generation During LLM inference you receive tokens incrementally (streaming). Implement logic that decides when to stop generation based on one or more **stop sequences**. ### Input - A stream/iterator of generated token IDs (or strings). - A list of stop sequences, where each stop sequence can be: - a single token ID, or - a list of token IDs representing a multi-token sequence (e.g., `[A, B, C]`). ### Output / behavior - As tokens arrive, emit generated tokens **up to but not including** the first occurrence of any stop sequence. - Stop as soon as any stop sequence is detected. ### Requirements - Must work for overlapping matches (e.g., stop sequences `[1,2,1]` and stream `...1,2,1`). - Must handle cases where a partial stop sequence appears at the end of the current buffer and completes with future tokens. - Efficient: do not rescan the entire history per token. ### Clarifications - If multiple stop sequences could match ending at the same position, stopping is immediate regardless of which matched. - If the stream ends without a stop sequence, return all tokens. Describe your approach and complexity; implement in your chosen language.

Quick Answer: This question evaluates streaming sequence-detection and pattern-matching skills, including online buffer management and correct handling of overlapping and partial multi-token stop sequences in LLM inference, and falls under the Coding & Algorithms category with domain focus on streaming algorithms for machine learning inference.

During streaming generation, tokens arrive one by one. Implement a function that returns exactly what should be emitted before generation must stop.\n\nYou are given:\n- stream: a list of generated token IDs\n- stop_sequences: a list where each element is either a single integer token or a list of integers\n\nProcess the stream from left to right as if tokens were arriving online. As soon as any stop sequence is fully matched, stop immediately and return all tokens before that matched stop sequence. The stop sequence itself must not appear in the result.\n\nYour solution must:\n- handle overlapping stop sequences\n- keep partial matches that may complete with future tokens\n- avoid rescanning the full history for every new token\n\nIf the stream ends before any stop sequence appears, return the entire stream. If stop_sequences is empty, return the whole stream.

Constraints

0 <= len(stream) <= 200000
0 <= len(stop_sequences) <= 200000
The sum of lengths of all normalized stop sequences is at most 200000
Each non-empty stop sequence has length at least 1
Token IDs are integers

Examples

Input: ([5, 7, 8, 9, 10], [[8, 9]])

Expected Output: [5, 7]

Explanation: The first stop sequence [8, 9] starts at index 2, so only the tokens before it are returned.

Input: ([4, 1, 2, 1, 3], [[1, 2, 1]])

Expected Output: [4]

Explanation: The stop sequence overlaps with its own prefix and suffix, but it is still detected as soon as the final 1 arrives.

Input: ([1, 2, 4], [[1, 2, 3]])

Expected Output: [1, 2, 4]

Explanation: The prefix [1, 2] is only a partial match. Since the stream never produces 3, nothing is removed.

Input: ([6, 7, 8], [7])

Expected Output: [6]

Explanation: A stop sequence can be given as a single integer. Generation stops immediately when 7 arrives.

Input: ([5, 6, 7, 8], [[7, 8], 6])

Expected Output: [5]

Explanation: The single-token stop 6 is detected before the longer stop [7, 8] can complete.

Input: ([], [[1, 2]])

Expected Output: []

Explanation: With no generated tokens, the result is empty.

Input: ([2, 3, 4], [])

Expected Output: [2, 3, 4]

Explanation: Without any stop sequences, the entire stream is returned.

Hints

A trie lets you match many stop sequences at once instead of checking every stop sequence against every suffix of the stream.
You only need to keep the last max_stop_length - 1 un-emitted tokens in a pending buffer. Anything older can no longer be part of a future stop sequence.

Quick Overview