How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Anthropic.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Anthropic during technical interviews.

Implement a longest-match tokenizer | Anthropic Coding Question

Quick Overview

This question evaluates string-processing and algorithmic implementation skills for greedy longest-match tokenization, including handling runs of unmatched characters and reducing unnecessary comparisons when the vocabulary is small.

Loading coding console...

Part 2: Collapse consecutive unmatched characters into one -1

You are given a vocabulary that maps non-empty strings to integer token IDs, and a string text. Tokenize text from left to right using greedy longest-match rules. At each position, choose the longest vocabulary entry that matches the substring starting there. Output its token ID and advance by the matched length. If no vocabulary entry matches, treat that character as unmatched and advance by one character. However, if multiple unmatched characters occur consecutively, output only one -1 for the entire unmatched run.

Constraints

0 <= len(text) <= 10^4
0 <= len(vocab) <= 10^3
All vocabulary keys are unique, non-empty strings
Token IDs are integers

Examples

Input: ({'ab': 1, 'c': 2}, 'abxxczzz')

Expected Output: [1, -1, 2, -1]

Explanation: 'ab' matches, then 'xx' is one unmatched run, then 'c' matches, then 'zzz' is another unmatched run.

Input: ({'a': 1, 'aa': 2}, 'bbaaaac')

Expected Output: [-1, 2, 2, -1]

Explanation: 'bb' becomes one -1, then 'aa' and 'aa' match greedily, and the final 'c' becomes one -1.

Input: ({'xyz': 7}, 'abc')

Expected Output: [-1]

Explanation: All characters are unmatched, so they collapse into one -1.

Input: ({'a': 1}, '')

Expected Output: []

Explanation: Empty text produces no tokens.

Input: ({}, 'aab')

Expected Output: [-1]

Explanation: With an empty vocabulary, the whole text is one unmatched run.

Hints

Keep a flag that remembers whether you are currently inside an unmatched run.
Any successful token match should end an unmatched run.

Part 3: Optimize longest-match tokenization for a small vocabulary

Implement the same greedy longest-match tokenizer as in Part 1, but reduce unnecessary comparisons without using a trie. Preprocess the vocabulary by grouping entries by their first character. Then, at each position in text, only compare against vocabulary entries whose first character matches text[i]. Within each group, preserve greedy longest-match behavior by trying longer strings first. If nothing matches at the current position, output -1 and advance by one character.

Constraints

0 <= len(text) <= 10^4
0 <= len(vocab) <= 10^3
All vocabulary keys are unique, non-empty strings
Do not use a trie
Token IDs are integers

Examples

Input: ({'the': 1, 'th': 2, 'he': 3, 'a': 4}, 'thea!')

Expected Output: [1, 4, -1]

Explanation: At the start, only 'the' and 'th' need to be checked because text begins with 't'. 'the' is the greedy match.

Input: ({'x': 5, 'xy': 6, 'xyz': 7, 'ab': 1}, 'xyzxaby')

Expected Output: [7, 5, 1, -1]

Explanation: 'xyz' matches first, then 'x', then 'ab', and finally 'y' is unmatched.

Input: ({'aa': 1, 'ab': 2, 'b': 3}, 'abbaa')

Expected Output: [2, 3, 1]

Explanation: 'ab' matches first, then 'b', then 'aa'. Grouping by first character avoids checking impossible candidates.

Input: ({'a': 1}, '')

Expected Output: []

Explanation: Empty text produces no tokens.

Input: ({}, 'ab')

Expected Output: [-1, -1]

Explanation: With no vocabulary entries, every character is unmatched.

Hints

A vocabulary word can only match at position i if its first character equals text[i].
Sort each first-character group by descending length so the first successful match is still the greedy one.

Quick Overview