PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of data structures and algorithms for frequency counting and associative lookups, including preprocessing-versus-query trade-offs, tie-breaking, and handling edge cases in sequential data.

  • easy
  • Google
  • Coding & Algorithms
  • Software Engineer

Build a next-word frequency predictor

Company: Google

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Onsite

You are given a sequence of tokens (words) representing a corpus, e.g.: `tokens = [w0, w1, w2, ..., wK-1]` You need to build a **next-word predictor** that supports queries of the form: - `predict(word) -> next_word` Where `next_word` is the word that **most frequently follows** `word` in the corpus. More precisely: - For every adjacent pair `(tokens[i], tokens[i+1])`, count it as one occurrence of “`tokens[i+1]` follows `tokens[i]`”. - For a query `word = x`, return the `y` that maximizes `count(x, y)`. - If `x` never appears or never has a following word (e.g., it only appears as the last token), return a sentinel such as `null` / empty string. - If there is a tie for max frequency, you may return any tied word (or specify a deterministic tie-break, e.g., lexicographically smallest). ### Task Design the data structures and implement: 1. A one-time `build(tokens)` preprocessing step. 2. A `predict(word)` query. ### Constraints / expectations - The corpus is processed once (batch/offline build), then many queries may follow. - Aim for efficient preprocessing and fast queries.

Quick Answer: This question evaluates understanding of data structures and algorithms for frequency counting and associative lookups, including preprocessing-versus-query trade-offs, tie-breaking, and handling edge cases in sequential data.

You are given a list of words called tokens representing a corpus, and a list of query words. For every adjacent pair of words in the corpus, count how many times the second word immediately follows the first. After this one-time preprocessing, answer each query word x with the word that most frequently follows x in the corpus. If multiple words are tied for the highest frequency, return the lexicographically smallest one. If a query word never appears with a following word, return an empty string. Implement this as a single function that builds the predictor once and returns predictions for all queries.

Constraints

  • 0 <= len(tokens), len(queries) <= 200000
  • Each token and query is a string
  • If multiple next words have the same highest frequency, return the lexicographically smallest one
  • Your approach should preprocess once and answer each query in O(1) average time afterward

Examples

Input: (['i', 'am', 'sam', 'i', 'am', 'happy', 'i', 'am', 'sam'], ['i', 'am', 'sam', 'happy', 'unknown'])

Expected Output: ['am', 'sam', 'i', 'i', '']

Explanation: i is followed by am 3 times. am is followed by sam 2 times and happy 1 time, so sam wins. sam is followed by i once. happy is followed by i once. unknown has no following word.

Input: (['a', 'c', 'a', 'b', 'a', 'c', 'a', 'b'], ['a', 'b', 'c'])

Expected Output: ['b', 'a', 'a']

Explanation: For a, both b and c follow it 2 times, so the lexicographically smaller word b is returned. b is followed by a once. c is followed by a twice.

Input: (['solo'], ['solo', 'x'])

Expected Output: ['', '']

Explanation: There are no adjacent pairs in a single-word corpus, so no query has a valid prediction.

Input: ([], ['anything'])

Expected Output: ['']

Explanation: An empty corpus has no next-word information.

Input: (['go', 'home', 'go', 'home', 'stay'], ['home', 'stay', 'go'])

Expected Output: ['go', '', 'home']

Explanation: home is followed by go once and stay once, so go is chosen by lexicographic tie-break. stay has no following word. go is followed by home twice.

Hints

  1. Use a hash map from each word to another hash map that stores counts of words that follow it.
  2. To make queries fast, keep track of the current best next word for each word while building the counts.
Last updated: May 21, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Rooms and Top-K Streams - Google (medium)
  • Find Containing Range - Google (medium)
  • Rearrange Tasks With Cooldown - Google (medium)
  • Implement Employee Management and Expression Evaluation - Google (medium)
  • Solve Three Array and Matrix Path Problems - Google (medium)