Find Dictionary Matches for a Jammed String
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates string-processing and pattern-matching skills, specifically run-length grouping of consecutive characters and efficient dictionary lookup, within the Coding & Algorithms domain.
Part 1: Find All Dictionary Words That Could Produce a Jammed String
Constraints
- 0 <= len(dictionary_words) <= 10^4
- 0 <= len(jammed_string) <= 10^5
- The sum of lengths of all dictionary words is at most 2 * 10^5
- All strings contain only lowercase English letters
Examples
Input: (['help', 'hello', 'shell', 'hellp'], 'hellp')
Expected Output: ['help', 'hellp']
Explanation: help can expand its single l into ll, and hellp already matches exactly.
Input: (['hello', 'halo', 'heeello'], 'hheeellllo')
Expected Output: ['hello', 'heeello']
Explanation: hello and heeello share the same grouped signature as the jammed string and fit within its group counts.
Input: ([], 'abc')
Expected Output: []
Explanation: With no dictionary words, there are no matches.
Input: (['a', 'aa', 'b'], 'aaa')
Expected Output: ['a', 'aa']
Explanation: Both a and aa can expand to aaa, but b has the wrong character group.
Hints
- Compress each string into two parts: the sequence of group characters and the list of group counts.
- A word matches only if the group-character sequence is identical and every word count is at most the corresponding jammed count.
Part 2: Answer Many Jammed Queries with Signature Bucketing
Constraints
- 0 <= len(dictionary_words) <= 10^4
- 0 <= len(queries) <= 10^4
- The total length of all dictionary words plus all queries is at most 3 * 10^5
- All strings contain only lowercase English letters
Examples
Input: (['help', 'hello', 'hellp', 'banana', 'bananaa', 'heeello'], ['hellp', 'hheeellllo', 'baanaaannnaaa', 'xyz'])
Expected Output: [['help', 'hellp'], ['hello', 'heeello'], ['banana', 'bananaa'], []]
Explanation: Each query only needs to inspect words from its matching signature bucket.
Input: (['a', 'aa'], [])
Expected Output: []
Explanation: No queries means no answers.
Input: ([], ['a', 'aaa'])
Expected Output: [[], []]
Explanation: An empty dictionary produces no matches for any query.
Input: (['a', 'aa', 'ab'], ['aaa', 'abb'])
Expected Output: [['a', 'aa'], ['ab']]
Explanation: aaa matches the single-group bucket, while abb matches only ab.
Hints
- Build a hash map from compressed signature to the words that have that signature.
- For a query, only compare against the bucket with the same signature.
Part 3: Reuse Precomputed Group Counts for Fast Repeated Queries
Constraints
- 0 <= len(dictionary_words) <= 10^4
- 0 <= len(queries) <= 10^4
- The total length of all dictionary words plus all queries is at most 3 * 10^5
- All strings contain only lowercase English letters
Examples
Input: (['ab', 'aab', 'abb', 'aaabbb', 'xyz'], ['aaabbb', 'ab', 'aabbb', 'xxyzzz'])
Expected Output: [['ab', 'aab', 'abb', 'aaabbb'], ['ab'], ['ab', 'aab', 'abb'], ['xyz']]
Explanation: After signature filtering, each candidate is checked using only its stored counts.
Input: (['hello', 'heeello'], ['halo'])
Expected Output: [[]]
Explanation: The query has a different signature, so no bucket matches.
Input: ([], ['a', 'aa'])
Expected Output: [[], []]
Explanation: No dictionary words means no matches.
Input: (['a'], [])
Expected Output: []
Explanation: No queries means no output rows.
Hints
- Store each dictionary word as a pair of its original word and its group-count vector, keyed by signature.
- Inside one signature bucket, matching becomes a simple component-wise comparison of integer arrays.
Part 4: Match a Same-Signature Bucket Using Only Group Counts
Constraints
- 0 <= len(entries) <= 10^4
- 0 <= len(queries) <= 10^4
- All entries in the bucket are guaranteed to represent the same hidden signature
- 1 <= len(counts) <= 100 for every non-empty entry
- All count values are positive integers
Examples
Input: ([('help', [1, 1, 1, 1]), ('hellp', [1, 1, 2, 1]), ('heeelp', [1, 3, 1, 1])], [[1, 1, 2, 1], [2, 3, 4, 1], [1, 1, 1]])
Expected Output: [['help', 'hellp'], ['help', 'hellp', 'heeelp'], []]
Explanation: The third query has the wrong number of groups, so it matches nothing.
Input: ([], [[1], [2]])
Expected Output: [[], []]
Explanation: An empty bucket yields no matches for any query.
Input: ([('a', [1]), ('aa', [2])], [[1], [3]])
Expected Output: [['a'], ['a', 'aa']]
Explanation: The second query can accommodate both stored count vectors.
Input: ([('aaa', [3]), ('aaaaa', [5])], [[4], [5]])
Expected Output: [['aaa'], ['aaa', 'aaaaa']]
Explanation: Counts are compared component-wise within a single-group bucket.
Hints
- Because the signature is shared by the whole bucket, you never need to compare characters per word.
- A word matches exactly when every stored count is less than or equal to the corresponding query count.