How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a easy difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Google.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Google during technical interviews.

Build a next-word frequency predictor | Google Coding Question

Build a next-word frequency predictor

Company: Google

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Onsite

You are given a sequence of tokens (words) representing a corpus, e.g.: `tokens = [w0, w1, w2, ..., wK-1]` You need to build a **next-word predictor** that supports queries of the form: - `predict(word) -> next_word` Where `next_word` is the word that **most frequently follows** `word` in the corpus. More precisely: - For every adjacent pair `(tokens[i], tokens[i+1])`, count it as one occurrence of “`tokens[i+1]` follows `tokens[i]`”. - For a query `word = x`, return the `y` that maximizes `count(x, y)`. - If `x` never appears or never has a following word (e.g., it only appears as the last token), return a sentinel such as `null` / empty string. - If there is a tie for max frequency, you may return any tied word (or specify a deterministic tie-break, e.g., lexicographically smallest). ### Task Design the data structures and implement: 1. A one-time `build(tokens)` preprocessing step. 2. A `predict(word)` query. ### Constraints / expectations - The corpus is processed once (batch/offline build), then many queries may follow. - Aim for efficient preprocessing and fast queries.

Quick Answer: This question evaluates understanding of data structures and algorithms for frequency counting and associative lookups, including preprocessing-versus-query trade-offs, tie-breaking, and handling edge cases in sequential data.

You are given a list of words called tokens representing a corpus, and a list of query words. For every adjacent pair of words in the corpus, count how many times the second word immediately follows the first. After this one-time preprocessing, answer each query word x with the word that most frequently follows x in the corpus. If multiple words are tied for the highest frequency, return the lexicographically smallest one. If a query word never appears with a following word, return an empty string. Implement this as a single function that builds the predictor once and returns predictions for all queries.

Constraints

0 <= len(tokens), len(queries) <= 200000
Each token and query is a string
If multiple next words have the same highest frequency, return the lexicographically smallest one
Your approach should preprocess once and answer each query in O(1) average time afterward

Examples

Input: (['i', 'am', 'sam', 'i', 'am', 'happy', 'i', 'am', 'sam'], ['i', 'am', 'sam', 'happy', 'unknown'])

Expected Output: ['am', 'sam', 'i', 'i', '']

Explanation: i is followed by am 3 times. am is followed by sam 2 times and happy 1 time, so sam wins. sam is followed by i once. happy is followed by i once. unknown has no following word.

Input: (['a', 'c', 'a', 'b', 'a', 'c', 'a', 'b'], ['a', 'b', 'c'])

Expected Output: ['b', 'a', 'a']

Explanation: For a, both b and c follow it 2 times, so the lexicographically smaller word b is returned. b is followed by a once. c is followed by a twice.

Input: (['solo'], ['solo', 'x'])

Expected Output: ['', '']

Explanation: There are no adjacent pairs in a single-word corpus, so no query has a valid prediction.

Input: ([], ['anything'])

Expected Output: ['']

Explanation: An empty corpus has no next-word information.

Input: (['go', 'home', 'go', 'home', 'stay'], ['home', 'stay', 'go'])

Expected Output: ['go', '', 'home']

Explanation: home is followed by go once and stay once, so go is chosen by lexicographic tie-break. stay has no following word. go is followed by home twice.

Hints

Use a hash map from each word to another hash map that stores counts of words that follow it.
To make queries fast, keep track of the current best next word for each word while building the counts.

Quick Overview

This question evaluates understanding of data structures and algorithms for frequency counting and associative lookups, including preprocessing-versus-query trade-offs, tie-breaking, and handling edge cases in sequential data.