PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates streaming string processing and pattern-matching skills, specifically handling token detection across chunk boundaries, boundary conditions, and time/space efficiency. It is commonly asked in the Coding & Algorithms domain to assess practical application-level competence in stream-processing algorithms rather than purely theoretical understanding.

  • medium
  • Microsoft
  • Coding & Algorithms
  • Software Engineer

Stream output until stop token appears

Company: Microsoft

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are implementing a simplified **streaming conversational AI output filter**. Text arrives in **chunks** (strings) in order. There is a special **stop token** (e.g., `"<END>"`). Your job is to output **only the text that occurs before the first stop token**, and ignore everything from the stop token onward. ### Task Given: - `chunks`: a list of strings representing the stream chunks in arrival order - `stopToken`: a non-empty string (e.g., `"<END>"`) Return a list of strings representing what should be emitted to the user: - Emit all content **before** the first occurrence of `stopToken`. - Do **not** include `stopToken` itself. - If `stopToken` occurs in the middle of a chunk, emit only the prefix before it. - Once the stop token is detected, ignore all remaining input. ### Examples 1) Stop token fully contained in a chunk: - `chunks = ["hello", "how are you<END>", "HAPPY"]` - Output: `["hello", "how are you"]` 2) Stop token may be **split across chunk boundaries** (truncated in one chunk and completed in the next): - `chunks = ["hello", "how are you<E", "ND>HAPPY"]` - Output: `["hello", "how are you"]` ### Requirements / Constraints - Treat input as a stream: you should not rely on concatenating the entire stream into one giant string. - Aim for **O(total characters)** time. - Extra space should be small (e.g., proportional to `len(stopToken)`). *(You may assume ASCII/UTF-8 strings; exact character set is not important as long as matching is consistent.)*

Quick Answer: This question evaluates streaming string processing and pattern-matching skills, specifically handling token detection across chunk boundaries, boundary conditions, and time/space efficiency. It is commonly asked in the Coding & Algorithms domain to assess practical application-level competence in stream-processing algorithms rather than purely theoretical understanding.

You are implementing a simplified streaming output filter. Text arrives as ordered string chunks, and there is a special non-empty stop token such as "<END>". Return the non-empty pieces that should be emitted to the user, in order: - Emit only text that appears before the first occurrence of `stopToken`. - Do not include `stopToken` itself. - If `stopToken` starts in one chunk and finishes in a later chunk, it still counts as a match. - After the first match, ignore the rest of the stream. - While processing each chunk, combine all characters that become safe to emit during that chunk into one output piece. - If the stream ends without ever seeing `stopToken`, flush any remaining buffered characters as one final piece. The concatenation of the returned list must equal exactly the full stream prefix that appears before the first stop token.

Constraints

  • 0 <= len(chunks) <= 10^5
  • 1 <= len(stopToken) <= 10^5
  • 0 <= sum(len(chunk) for chunk in chunks) <= 2 * 10^5
  • Chunks may contain empty strings
  • Aim for O(total_characters + len(stopToken)) time and O(len(stopToken)) extra space

Examples

Input: (["hello", "how are you<END>", "HAPPY"], "<END>")

Expected Output: ["hello", "how are you"]

Explanation: The stop token appears completely inside the second chunk, so emit everything before it and ignore the rest.

Input: (["hello", "how are you<E", "ND>HAPPY"], "<END>")

Expected Output: ["hello", "how are you"]

Explanation: The token is split across chunk boundaries. The trailing "<E" from the second chunk must be buffered, not emitted.

Input: (["ab<E", "Xcd"], "<END>")

Expected Output: ["ab", "<EXcd"]

Explanation: After seeing "<E" you must wait, but the next character is "X", so the buffered text is no longer part of the stop token and becomes safe to emit.

Input: (["<", "END>", "later"], "<END>")

Expected Output: []

Explanation: The stream starts with the stop token, split across chunks, so nothing is emitted.

Input: (["abc", "#def", "ghi"], "#")

Expected Output: ["abc"]

Explanation: With a one-character stop token, emission stops immediately when that character is first seen.

Input: (["", "abc", "", "<EN", "D>", "tail"], "<END>")

Expected Output: ["abc"]

Explanation: Empty chunks should be handled correctly. The stop token appears later across two chunks, so only "abc" is emitted.

Input: (["abc<EN"], "<END>")

Expected Output: ["abc", "<EN"]

Explanation: The stream ends before the buffered partial match can become a full stop token, so the remaining buffered suffix must be flushed at end-of-stream.

Input: (["aaaa", "c"], "aaab")

Expected Output: ["a", "aaac"]

Explanation: Because the stop token overlaps with itself, only one "a" is safe after the first chunk. When "c" arrives, the buffered "aaa" can no longer form "aaab" and becomes safe.

Hints

  1. At any moment, the only text you must keep is the longest suffix of already-seen characters that could still be the beginning of `stopToken`.
  2. A prefix-function / KMP-style matcher lets you update that suffix length in amortized O(1) per character, even when the token overlaps with itself.
Last updated: Jun 5, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Return Top K Open Businesses - Microsoft (hard)
  • Implement Memory Allocation and In-Memory Records - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Sort Three Categories In Place - Microsoft (medium)
  • Retain Top K Elements - Microsoft (medium)