PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates streaming string processing, incremental substring detection across chunk boundaries, and memory-efficient algorithm design for low-latency consumption of potentially large text streams.

  • hard
  • Microsoft
  • Coding & Algorithms
  • Software Engineer

Extract stream prefix before stop token

Company: Microsoft

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are consuming a text stream delivered as an iterator/generator of **string chunks** (each chunk may be any length, including empty). You are also given a **stop token** string `token`. Concatenate the chunks conceptually into one long string `S`. Your task is to return the **prefix of `S` that appears strictly before the first occurrence of `token`**. Key requirements: - The stop token may be **split across chunk boundaries** (e.g., `"ab"` in one chunk and `"c"` in the next forms token `"abc"`). - You must be able to **return as soon as the first full token is detected**. You are **not allowed** to buffer the entire stream and then search. - Aim for low extra memory usage (e.g., proportional to `len(token)` rather than total stream size). If the token never appears, return the entire concatenated stream content (up to end-of-stream). ### Function signature (example) - Input: `chunks: Iterator[str]`, `token: str` - Output: `str` ### Examples 1) `chunks = ["hello ", "wor", "ld<END>", "ignored"]`, `token = "<END>"` → return `"hello world"` 2) `chunks = ["abc", "d", "ef"]`, `token = "cde"` (token crosses boundary) → return `"ab"` 3) `chunks = ["aaaaa"]`, `token = "b"` → return `"aaaaa"` ### Constraints (reasonable interview assumptions) - `1 <= len(token) <= 10^5` - Total streamed characters can be large (potentially millions), so the solution should be close to linear time in the number of consumed characters. Implement the function to satisfy the requirements above.

Quick Answer: This question evaluates streaming string processing, incremental substring detection across chunk boundaries, and memory-efficient algorithm design for low-latency consumption of potentially large text streams.

Given text chunks and a stop token, return the stream prefix before the first stop-token occurrence, including when the token spans chunks.

Constraints

  • Inputs are provided as Python literals compatible with the function signature.
  • Return a deterministic value exactly matching the requested output.

Examples

Input: (['abc', 'deSTOPfg'], 'STOP')

Expected Output: 'abcde'

Explanation: Inside one chunk.

Input: (['hello ST', 'OP world'], 'STOP')

Expected Output: 'hello '

Explanation: Across chunks.

Input: (['no', ' marker'], '<end>')

Expected Output: 'no marker'

Explanation: Absent stop token.

Input: (['abc'], '')

Expected Output: ''

Explanation: Empty stop token.

Hints

  1. Start with a direct data structure representation.
  2. Handle edge cases before the main loop.
Last updated: Jun 27, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Return Top K Open Businesses - Microsoft (hard)
  • Implement Memory Allocation and In-Memory Records - Microsoft (medium)
  • Sort Three Categories In Place - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Retain Top K Elements - Microsoft (medium)