Design autocomplete with Trie
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of Trie-based string indexing, augmented per-node metadata for efficient prefix queries, and algorithmic analysis of insert, update, delete, and topK operations including their time and space complexity.
Constraints
- Words consist of lowercase English letters a-z only (branching factor <= 26).
- Weights are integers fitting in a machine word; ties broken by lexicographic order of the word.
- topK is the hot path; mutations are comparatively rare.
- Dictionary size: thousands to low millions of words, single-machine in-memory.
- insert on an existing word overwrites its weight; update/delete on a missing word is a no-op.
- topK with an empty prefix returns the global top-k; a prefix with no matches returns [].
Examples
Input: ([['insert', 'cat', 5], ['insert', 'car', 3], ['insert', 'cart', 8], ['insert', 'dog', 2], ['topK', 'ca', 2], ['update', 'car', 9], ['topK', 'ca', 2], ['delete', 'cart'], ['topK', 'ca', 2]],)
Expected Output: [['cart', 'cat'], ['car', 'cart'], ['car', 'cat']]
Explanation: The question's exact dry run. After the inserts, 'ca' subtree holds cart:8, cat:5, car:3, so topK('ca',2)=[cart,cat]. update('car',9) makes car the top, so topK=[car,cart]. delete('cart') leaves car:9, cat:5, so topK=[car,cat]. 'dog' lives in a different subtree and is never touched.
Input: ([['topK', 'ca', 3]],)
Expected Output: [[]]
Explanation: Querying a prefix on an empty dictionary: the descent hits a missing child immediately, so topK returns [].
Input: ([['insert', 'a', 1], ['topK', '', 5], ['topK', 'a', 5], ['topK', 'b', 2]],)
Expected Output: [['a'], ['a'], []]
Explanation: Empty prefix returns the global top-k (root cache) = ['a']; prefix 'a' matches the exact word 'a'; prefix 'b' has no node, returning []. Also exercises the fewer-than-k case (only 1 word but k=5).
Input: ([['insert', 'app', 5], ['insert', 'apple', 5], ['insert', 'apply', 5], ['topK', 'app', 10]],)
Expected Output: [['app', 'apple', 'apply']]
Explanation: Three words with equal weight 5 share prefix 'app'; ties break lexicographically so the order is app < apple < apply. Also tests an internal node ('app') that is itself a complete word.
Input: ([['insert', 'car', 3], ['insert', 'cart', 3], ['update', 'car', 10], ['delete', 'missing'], ['topK', 'car', 2], ['delete', 'car'], ['topK', 'car', 2]],)
Expected Output: [['car', 'cart'], ['cart']]
Explanation: update('car',10) lifts car above cart. delete('missing') is a safe no-op. After delete('car'), the node 'car' is still on the path to 'cart' but is no longer a terminal word, so topK('car',2) returns only ['cart'] — verifying delete clears the terminal weight while keeping descendants reachable.
Input: ([['insert', 'b', 5], ['insert', 'a', 5], ['insert', 'c', 5], ['topK', '', 2]],)
Expected Output: [['a', 'b']]
Explanation: Three single-letter words all with weight 5; empty-prefix topK with k=2 returns the two lexicographically smallest, ['a','b'], confirming the (-weight, word) ordering at the root cache regardless of insertion order.
Hints
- A Trie answers 'which words share this prefix?' for free: after walking the prefix, the node you land on roots the subtree of all matching words. The open question is getting top-k by weight out of that subtree without scanning it on every query.
- Propagate the best candidates up the spine: at each node cache a small sorted list of its subtree's top words keyed by (-weight, word). Then topK is just walk-to-prefix-node and read the cached list — O(P + k).
- A weight change or deletion can only affect the cached lists of ancestors on the path from the word's terminal node to the root. Re-derive each node's cache bottom-up along that path from its own word plus its children's caches; every node off the path stays valid.
- Include the node's OWN terminal word when rebuilding, not just children — an internal node can also be a complete word (e.g. 'car' is a prefix of 'cart').