Build a next-word frequency predictor

Q: Build a next-word frequency predictor

This is a Coding & Algorithms interview question from Google for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Loading...

You are given a sequence of tokens (words) representing a corpus, e.g.:

tokens = [w0, w1, w2, ..., wK-1]

You need to build a next-word predictor that supports queries of the form:

predict(word) -> next_word

Where next_word is the word that most frequently follows word in the corpus.

More precisely:

For every adjacent pair (tokens[i], tokens[i+1]) , count it as one occurrence of “ tokens[i+1] follows tokens[i] ”.
For a query word = x , return the y that maximizes count(x, y) .
If x never appears or never has a following word (e.g., it only appears as the last token), return a sentinel such as null / empty string.
If there is a tie for max frequency, you may return any tied word (or specify a deterministic tie-break, e.g., lexicographically smallest).

Task

Design the data structures and implement:

A one-time build(tokens) preprocessing step.
A predict(word) query.

Constraints / expectations

The corpus is processed once (batch/offline build), then many queries may follow.
Aim for efficient preprocessing and fast queries.

Build a next-word frequency predictor

Task

Constraints / expectations

Comments (0)