Next-Token Decoding: Greedy and Beam Search
Context
You are given a probabilistic next-token dictionary D that maps each token t to a dictionary of candidate next tokens and their conditional probabilities: D[t] = {u: p(u | t)}. Use '<START>' as the start token and '.' as the end-of-sequence token. Assume probabilities in each D[t] sum to 1. If a token does not appear as a key in D, it has no continuations (dead-end).
Tasks
-
Greedy Decoding
-
Starting from '
<START>
', repeatedly select the highest-probability next token until '.' is produced or no continuation exists.
-
Provide both a recursive and an iterative version.
-
Return the sequence of tokens after '
<START>
' (include '.' if produced).
-
Analyze time and space complexity.
-
Beam Search Decoding (beam width k)
-
Use a BFS-style expansion: at each decoding step, maintain up to k highest-scoring partial sequences ranked by cumulative log-probability.
-
Expand each active beam by its top-k next tokens from D; continue until all beams end with '.'.
-
Return the best sequence and optionally the top-k sequences.
-
Explain how you handle:
-
Ties
-
Tokens missing from D (dead-ends)
-
Potential cycles
-
Score normalization (e.g., length normalization)
-
Termination criteria
-
Compare greedy vs. beam search in quality, complexity, and typical use cases.
-
Demonstrate your implementations on a small example dictionary.