PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/xAI

Implement dynamic batching for token decoding

Last updated: Mar 29, 2026

Quick Overview

This question evaluates dynamic batching, per-request state management, and sequence-decoding correctness for language-model inference, including handling stop conditions, max-token limits, and maintaining a correct slot-to-request mapping.

  • medium
  • xAI
  • Coding & Algorithms
  • Machine Learning Engineer

Implement dynamic batching for token decoding

Company: xAI

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are given a black-box “simulated language model” interface that can advance many sequences in a batch. ## Model interface - Tokens are integers. - `model_next(batch_prefixes) -> next_tokens` - `batch_prefixes` is a list of token lists, one per active sequence in the current batch. - `next_tokens` is a list of integers of the same length, where `next_tokens[i]` is the next generated token for `batch_prefixes[i]`. ## Requests Each request/sequence has: - `prompt_tokens`: initial prefix tokens - `max_tokens`: maximum number of *generated* tokens allowed (not counting the prompt) - A stopping rule: - either a `stop_token` (single token), or - a `stop_sequence` (a list of tokens that, when it appears as a suffix of the generated output, ends generation) - A callback to return the final generated tokens (or you may collect results to return at the end). ## Batch execution requirement Implement a decoding/sampling engine with **dynamic batching**: - There is a fixed batch capacity `B`. - Requests arrive in a waiting queue (you can assume they are all available initially, or you can model an input queue). - You repeatedly call `model_next` to advance active sequences. - Sequences may finish at different times due to: - reaching `max_tokens`, or - hitting the stop condition. - When a sequence finishes, its slot becomes free and should be **refilled** from the waiting queue if possible. - Near the end, the batch may be partially filled; your code must handle `len(active) < B` correctly. ## Correctness requirement Maintain a correct mapping between **batch slots** and **requests** so that tokens and final outputs are never mixed up after refilling (e.g., via a `slot_id -> request_id` mapping). ## Task Write a function (or class) that runs this dynamic-batching decoding loop until all requests are completed, and returns (or callbacks) the generated outputs per request. Clearly define: - your data structures (active slots, waiting queue, per-request state), - the main loop and termination condition, - how you detect stop conditions (especially `stop_sequence`), - and how you handle partially filled batches.

Quick Answer: This question evaluates dynamic batching, per-request state management, and sequence-decoding correctness for language-model inference, including handling stop conditions, max-token limits, and maintaining a correct slot-to-request mapping.

Related Interview Questions

  • Flatten and unflatten nested Python structures - xAI (nan)
  • Compute dasher pay from order events - xAI (nan)
  • Compute total active time per Twitter Space - xAI (medium)
  • Design a Recoverable Iterator - xAI (medium)
  • Implement Distributed Matrix Multiplication - xAI (hard)
xAI logo
xAI
Sep 22, 2025, 12:00 AM
Machine Learning Engineer
Onsite
Coding & Algorithms
39
0

You are given a black-box “simulated language model” interface that can advance many sequences in a batch.

Model interface

  • Tokens are integers.
  • model_next(batch_prefixes) -> next_tokens
    • batch_prefixes is a list of token lists, one per active sequence in the current batch.
    • next_tokens is a list of integers of the same length, where next_tokens[i] is the next generated token for batch_prefixes[i] .

Requests

Each request/sequence has:

  • prompt_tokens : initial prefix tokens
  • max_tokens : maximum number of generated tokens allowed (not counting the prompt)
  • A stopping rule:
    • either a stop_token (single token), or
    • a stop_sequence (a list of tokens that, when it appears as a suffix of the generated output, ends generation)
  • A callback to return the final generated tokens (or you may collect results to return at the end).

Batch execution requirement

Implement a decoding/sampling engine with dynamic batching:

  • There is a fixed batch capacity B .
  • Requests arrive in a waiting queue (you can assume they are all available initially, or you can model an input queue).
  • You repeatedly call model_next to advance active sequences.
  • Sequences may finish at different times due to:
    • reaching max_tokens , or
    • hitting the stop condition.
  • When a sequence finishes, its slot becomes free and should be refilled from the waiting queue if possible.
  • Near the end, the batch may be partially filled; your code must handle len(active) < B correctly.

Correctness requirement

Maintain a correct mapping between batch slots and requests so that tokens and final outputs are never mixed up after refilling (e.g., via a slot_id -> request_id mapping).

Task

Write a function (or class) that runs this dynamic-batching decoding loop until all requests are completed, and returns (or callbacks) the generated outputs per request.

Clearly define:

  • your data structures (active slots, waiting queue, per-request state),
  • the main loop and termination condition,
  • how you detect stop conditions (especially stop_sequence ),
  • and how you handle partially filled batches.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More xAI•More Machine Learning Engineer•xAI Machine Learning Engineer•xAI Coding & Algorithms•Machine Learning Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.