Design sequence decoding with greedy and beam search
Company: Cresta
Role: Software Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are given a probabilistic next-token dictionary D that maps each token t to a dictionary of candidate next tokens {u: p(u|t)}. Use '<START>' as the start token and '.' as the end-of-sequence token.
1) Implement greedy decoding that, starting from '<START>', repeatedly selects the highest-probability next token until '.' is produced or no continuation exists. Provide both a recursive and an iterative version, and return the sequence of tokens (include '.' if produced). Analyze time and space complexity.
2) Implement beam search decoding with beam width k using a BFS-style expansion: at each decoding step, maintain up to k highest-scoring partial sequences ranked by cumulative log-probability, expand each by its top-k next tokens from D, and continue until all beams end with '.'. Return the best sequence and optionally the top-k sequences. Explain how you handle ties, tokens missing from D, potential cycles or dead-ends, score normalization (e.g., length normalization), and termination criteria. Compare greedy and beam search in terms of quality, complexity, and typical use cases, and demonstrate your implementations on a small example dictionary.
Quick Answer: This question evaluates understanding of sequence decoding algorithms, probabilistic next-token modeling, and algorithmic trade-offs between greedy and beam search for sequence generation.