PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Google

Build a bigram next-word predictor with weighted sampling

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of basic probabilistic language modeling—specifically bigram/first-order Markov models and weighted sampling for next-word prediction—within the Machine Learning (natural language processing) domain, emphasizing practical implementation skills alongside conceptual scalability trade-offs.

  • medium
  • Google
  • Machine Learning
  • Software Engineer

Build a bigram next-word predictor with weighted sampling

Company: Google

Role: Software Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

You are given a training set of token sequences (sentences), for example: ``` [["a","b","c"], ["a","s","d"]] ``` 1) Train a simple **next-word prediction** model that, for each word `w`, counts which words most frequently appear **immediately after** `w` (a bigram / 1st-order Markov model). 2) At inference time, given a current word `w`, output a **random next word** sampled **proportionally to the observed counts** after `w` (i.e., weighted by frequency). 3) Discuss what you would do if the vocabulary and/or number of distinct next-words per token is very large (memory and latency constraints).

Quick Answer: This question evaluates understanding of basic probabilistic language modeling—specifically bigram/first-order Markov models and weighted sampling for next-word prediction—within the Machine Learning (natural language processing) domain, emphasizing practical implementation skills alongside conceptual scalability trade-offs.

Related Interview Questions

  • Explain ranking cold-start strategies - Google (medium)
  • Explain LLM fine-tuning and generative models - Google (medium)
  • Compare NLP tokenization and LLM recommendations - Google (medium)
  • Explain LLM lifecycle and trade-offs - Google (medium)
  • Model Soccer Shot Conversion - Google (hard)
|Home/Machine Learning/Google

Build a bigram next-word predictor with weighted sampling

Google logo
Google
Jan 11, 2026, 12:00 AM
mediumSoftware EngineerTechnical ScreenMachine Learning
5
0

You are given a training set of token sequences (sentences), for example:

[["a","b","c"],
 ["a","s","d"]]
  1. Train a simple next-word prediction model that, for each word w , counts which words most frequently appear immediately after w (a bigram / 1st-order Markov model).
  2. At inference time, given a current word w , output a random next word sampled proportionally to the observed counts after w (i.e., weighted by frequency).
  3. Discuss what you would do if the vocabulary and/or number of distinct next-words per token is very large (memory and latency constraints).
Loading comments...

Browse More Questions

More Machine Learning•More Google•More Software Engineer•Google Software Engineer•Google Machine Learning•Software Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.