PracHub
QuestionsPremiumLearningGuidesInterview PrepCoaches
|Home/Coding & Algorithms/Bloomberg

Predict most likely next word from training data

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in frequency-based language modeling and data-structure design, focusing on token sequence analysis, frequency counting, and handling boundary conditions.

  • medium
  • Bloomberg
  • Coding & Algorithms
  • Software Engineer

Predict most likely next word from training data

Company: Bloomberg

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are given tokenized training data consisting of multiple sentences (each sentence is a list of words). Build a lightweight model/data structure that can answer queries of the form: given a word `w`, return the **most likely next word** that follows `w` in the training data. Example training data: ```text training_data = [["I", "am", "sam"], ["am", "sam"]] ``` From this data: - Query `"I"` → output `"am"` - Query `"am"` → output `"sam"` ### Requirements 1. **Training:** Process `training_data` once to build the model. 2. **Query:** `next_word(w)` should return the most frequent word that immediately follows `w` across all sentences. 3. If `w` never appears followed by another word (e.g., `w` only appears at sentence end or never appears), return a sentinel such as `null`/empty string. 4. If there is a tie for most frequent next word, you may return any tied word (or specify a deterministic tie-break such as lexicographically smallest). ### Follow-ups (discuss tradeoffs) - Can you do better than **O(N)** space in terms of total tokens `N` in the training set? Under what assumptions? - What other data structures could you use (e.g., hash maps vs. tries vs. compressed representations)? - Approximately how many unique words / transitions can you store given a memory budget (e.g., explain what dominates memory usage)? - How would you extend this to **autocomplete an entire sentence**, e.g., repeatedly predict the next word until an end-of-sentence token or a max length is reached?

Quick Answer: This question evaluates proficiency in frequency-based language modeling and data-structure design, focusing on token sequence analysis, frequency counting, and handling boundary conditions.

Related Interview Questions

  • Solve meeting and tree problems - Bloomberg (easy)
  • Check connectivity between two subway stations - Bloomberg (easy)
  • Minimize travel cost with two cities - Bloomberg (easy)
  • Find tree root and bucket numbers - Bloomberg (hard)
  • Design a data structure for dynamic top‑K frequency - Bloomberg (hard)
Bloomberg logo
Bloomberg
Feb 9, 2026, 12:00 AM
Software Engineer
Onsite
Coding & Algorithms
6
0
Loading...

You are given tokenized training data consisting of multiple sentences (each sentence is a list of words). Build a lightweight model/data structure that can answer queries of the form: given a word w, return the most likely next word that follows w in the training data.

Example training data:

training_data = [["I", "am", "sam"], ["am", "sam"]]

From this data:

  • Query "I" → output "am"
  • Query "am" → output "sam"

Requirements

  1. Training: Process training_data once to build the model.
  2. Query: next_word(w) should return the most frequent word that immediately follows w across all sentences.
  3. If w never appears followed by another word (e.g., w only appears at sentence end or never appears), return a sentinel such as null /empty string.
  4. If there is a tie for most frequent next word, you may return any tied word (or specify a deterministic tie-break such as lexicographically smallest).

Follow-ups (discuss tradeoffs)

  • Can you do better than O(N) space in terms of total tokens N in the training set? Under what assumptions?
  • What other data structures could you use (e.g., hash maps vs. tries vs. compressed representations)?
  • Approximately how many unique words / transitions can you store given a memory budget (e.g., explain what dominates memory usage)?
  • How would you extend this to autocomplete an entire sentence , e.g., repeatedly predict the next word until an end-of-sentence token or a max length is reached?

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Bloomberg•More Software Engineer•Bloomberg Software Engineer•Bloomberg Coding & Algorithms•Software Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.