PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates proficiency with autoregressive sequence generation, probabilistic sampling methods (greedy, top-k, top-p), softmax-based probability manipulation, and correct handling of end-of-sequence tokens.

  • medium
  • Cohere
  • Coding & Algorithms
  • Machine Learning Engineer

Implement Autoregressive Decoding Strategies

Company: Cohere

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are given a simplified autoregressive language model. The model receives a 1D sequence of token IDs and returns logits for the next token. The vocabulary is indexed from `0` to `vocab_size - 1`, and the end-of-sequence token, `EOS`, is the last token: `vocab_size - 1`. Implement decoding methods that repeatedly call `forward` to generate one token at a time. After each generated token, append it to the model input before generating the next token. Stop when `EOS` is generated. Return only the newly generated output tokens, not the original prompt, and do not include the `EOS` token in the returned output. ```python from typing import Union import numpy as np class RandomGPT: """ This class simulates a GPT-like autoregressive model. Assumptions: - Vocabulary IDs are sequential integers from 0 to vocab_size - 1. - The EOS token is vocab_size - 1. - Inputs and outputs are 1D arrays or lists of integers. - Decoding is autoregressive: prompt plus generated tokens are fed back into the model. """ def __init__(self): self.vocab_size = 10 self.seed_start = 10 self.rng = np.random.default_rng(self.seed_start) def forward(self, x: Union[list, np.ndarray]) -> np.ndarray: """ Return simulated logits as a 1D array of length vocab_size. The same input sequence always produces the same logits. """ x = np.array(x) if x.ndim != 1: raise ValueError("Input x must be a 1D array") seed = np.sum(x) + len(x) + np.sum(x * np.arange(len(x))) rng = np.random.default_rng(self.seed_start + seed) logits = (rng.random(self.vocab_size) - 0.5) * 2 return logits def softmax(self, logits: Union[list, np.ndarray]) -> np.ndarray: """ Convert logits into probabilities. """ return np.exp(logits) / np.sum(np.exp(logits)) def greedy_decode(self, input_tokens: Union[list, np.ndarray]) -> Union[list, np.ndarray]: """ Generate tokens by always selecting the token with the largest logit. Stop when EOS is generated. Return generated tokens excluding EOS. """ pass def topk_decode(self, input_tokens: Union[list, np.ndarray], k: int) -> Union[list, np.ndarray]: """ Generate tokens using top-k sampling. At each step: 1. Convert logits to probabilities. 2. Keep only the k tokens with the highest probabilities. 3. Renormalize probabilities over those k tokens. 4. Sample one token from that restricted distribution using self.rng. 5. Stop when EOS is generated. Return generated tokens excluding EOS. """ pass def topp_decode(self, input_tokens: Union[list, np.ndarray], p: float) -> Union[list, np.ndarray]: """ Generate tokens using top-p, also called nucleus, sampling. At each step: 1. Convert logits to probabilities. 2. Sort tokens by descending probability. 3. Keep the smallest prefix of tokens whose cumulative probability is at least p. 4. Renormalize probabilities over that prefix. 5. Sample one token from that restricted distribution using self.rng. 6. Stop when EOS is generated. Return generated tokens excluding EOS. """ pass ``` Requirements: - `greedy_decode` should use `argmax` over logits or probabilities. - `topk_decode` should validate that `1 <= k <= vocab_size`. - `topp_decode` should validate that `0 < p <= 1`. - Sampling-based methods should use `self.rng` so that results are reproducible after constructing a fresh `RandomGPT` object. - You may assume the simulated model eventually generates `EOS`.

Quick Answer: This question evaluates proficiency with autoregressive sequence generation, probabilistic sampling methods (greedy, top-k, top-p), softmax-based probability manipulation, and correct handling of end-of-sequence tokens.

Implement a platform-friendly version of autoregressive decoding. You are given precomputed model outputs in `logits_by_context`, which acts like a simplified `forward` function: for the current input sequence, look up `logits_by_context[tuple(current_sequence)]` to get the logits for the next token. The vocabulary is indexed from `0` to `vocab_size - 1`, and the EOS token is always `vocab_size - 1`. Generate one token at a time. After generating a non-EOS token, append it to the current model input before generating the next token. Stop when EOS is generated. Return only newly generated tokens, excluding EOS. Support three decoding methods: - `greedy`: choose the token with the largest logit. If tied, choose the smaller token ID. - `topk`: convert logits to probabilities using softmax, keep the `k` highest-probability tokens, renormalize over them, and sample using the next value from `random_values`. - `topp`: convert logits to probabilities using softmax, sort tokens by descending probability, keep the smallest prefix whose cumulative probability is at least `p`, renormalize over that prefix, and sample using the next value from `random_values`. For deterministic sampling, each sampling step consumes one value `r` from `random_values`, where `0 <= r < 1`. Choose the first candidate whose cumulative renormalized probability is greater than `r`.

Constraints

  • 1 <= vocab_size <= 100
  • EOS token ID is vocab_size - 1
  • `logits_by_context` contains logits for every context reached before EOS
  • `method` is one of 'greedy', 'topk', or 'topp'
  • For top-k, `param` must be an integer k such that 1 <= k <= vocab_size
  • For top-p, `param` must satisfy 0 < p <= 1
  • Every value in `random_values` must satisfy 0 <= r < 1

Examples

Input: ({(2,): [0.1, 2.0, -1.0, 0.0, 1.5], (2, 1): [0.0, -0.5, 3.0, 1.0, 2.0], (2, 1, 2): [-1.0, 0.0, 0.5, 1.0, 2.5]}, [2], 'greedy', 0, [])

Expected Output: [1, 2]

Explanation: Greedy chooses token 1, then token 2, then EOS token 4. EOS is not included in the returned output.

Input: ({(): [0.0, 0.0, -0.5, 1.0]}, [], 'greedy', 0, [])

Expected Output: []

Explanation: The empty prompt immediately generates EOS token 3, so no generated tokens are returned.

Input: ({(0,): [2.0, 1.0, 0.0, -1.0, 0.5], (0, 1): [0.0, 3.0, 2.0, 1.0, -1.0], (0, 1, 1): [-1.0, 0.0, 0.5, 1.0, 2.0]}, [0], 'topk', 2, [0.8, 0.2, 0.1])

Expected Output: [1, 1]

Explanation: With k=2, the first sample chooses token 1, the second sample chooses token 1 again, and the third sample chooses EOS token 4.

Input: ({(1,): [2.0, 1.0, 0.0, -1.0], (1, 1): [0.0, 0.0, 0.0, 2.0]}, [1], 'topp', 0.7, [0.9, 0.4])

Expected Output: [1]

Explanation: For the first step, top-p keeps tokens 0 and 1; random value 0.9 selects token 1. The next context's nucleus contains only EOS token 3.

Input: ({(5,): [0.0, 0.0, 0.0]}, [5], 'topp', 1.0, [0.7])

Expected Output: []

Explanation: With p=1.0 all tied tokens are included in token-ID order. Random value 0.7 selects token 2, which is EOS.

Hints

  1. Keep two sequences: the mutable current model input and the generated output tokens to return.
  2. For top-k and top-p, sort token IDs by probability descending, using token ID ascending as the tie-breaker.
Last updated: Jun 13, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.