Implement an extensible prefix tree
Company: Anthropic
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates the ability to design and implement an extensible prefix tree (trie), covering competencies in Unicode-aware string handling, memory and time optimization, concurrency or thread-safety considerations, and support for operations like insert, search, startsWith, countPrefix and erase.
Constraints
- 1 <= len(operations) == len(arguments) <= 2 * 10^5
- Each operations[i] is one of insert, search, startsWith, countPrefix, or erase
- 0 <= len(arguments[i]) <= 10^3
- The sum of all argument lengths is at most 2 * 10^5
- Arguments may contain arbitrary Unicode characters, and duplicate words are allowed
Examples
Input: (['insert','insert','search','search','startsWith','countPrefix','erase','search','countPrefix','startsWith'], ['apple','app','apple','ap','ap','app','apple','apple','app','apple'])
Expected Output: [None, None, True, False, True, 2, True, False, 1, False]
Explanation: After inserting apple and app, both share prefixes ap and app. Erasing apple removes only that word, leaving app.
Input: (['insert','insert','countPrefix','search','erase','search','countPrefix','erase','search','erase'], ['cat','cat','ca','cat','cat','cat','ca','cat','cat','cat'])
Expected Output: [None, None, 2, True, True, True, 1, True, False, False]
Explanation: Duplicate inserts are counted separately. Each erase removes one copy, and the final erase fails because cat is no longer stored.
Input: (['insert','insert','search','startsWith','countPrefix','erase','countPrefix','search','startsWith'], ['café','咖啡','café','咖','ca','咖啡','咖','咖啡','咖'])
Expected Output: [None, None, True, True, 1, True, 0, False, False]
Explanation: Unicode words behave like normal strings. Deleting 咖啡 removes the only word with prefix 咖.
Input: (['search','startsWith','countPrefix','insert','search','startsWith','countPrefix','erase','search','countPrefix'], ['','','','','','','','','',''])
Expected Output: [False, False, 0, None, True, True, 1, True, False, 0]
Explanation: Checks the empty string as both a stored word and a prefix, before insertion, after insertion, and after deletion.
Input: (['insert','insert','insert','erase','erase','countPrefix','search','startsWith','countPrefix'], ['a','ab','abc','ab','ab','a','abc','ab',''])
Expected Output: [None, None, None, True, False, 2, True, True, 2]
Explanation: The second erase of ab returns False even though abc still keeps the prefix path alive. Prefix queries must still work correctly.
Solution
def solution(operations, arguments):
import threading
if len(operations) != len(arguments):
raise ValueError('operations and arguments must have the same length')
class Node:
__slots__ = ('children', 'pass_count', 'end_count')
def __init__(self):
self.children = {}
self.pass_count = 0
self.end_count = 0
class Trie:
def __init__(self):
self.root = Node()
self._lock = threading.RLock()
def _find(self, text):
node = self.root
for ch in text:
node = node.children.get(ch)
if node is None:
return None
return node
def insert(self, word):
with self._lock:
node = self.root
node.pass_count += 1
for ch in word:
nxt = node.children.get(ch)
if nxt is None:
nxt = Node()
node.children[ch] = nxt
node = nxt
node.pass_count += 1
node.end_count += 1
def search(self, word):
with self._lock:
node = self._find(word)
return node is not None and node.end_count > 0
def startsWith(self, prefix):
with self._lock:
node = self._find(prefix)
return node is not None and node.pass_count > 0
def countPrefix(self, prefix):
with self._lock:
node = self._find(prefix)
return 0 if node is None else node.pass_count
def erase(self, word):
with self._lock:
node = self.root
path = [self.root]
for ch in word:
node = node.children.get(ch)
if node is None:
return False
path.append(node)
if path[-1].end_count == 0:
return False
self.root.pass_count -= 1
for current in path[1:]:
current.pass_count -= 1
path[-1].end_count -= 1
for i in range(len(word) - 1, -1, -1):
parent = path[i]
ch = word[i]
child = parent.children[ch]
if child.pass_count == 0:
del parent.children[ch]
else:
break
return True
trie = Trie()
results = []
for op, arg in zip(operations, arguments):
if op == 'insert':
trie.insert(arg)
results.append(None)
elif op == 'search':
results.append(trie.search(arg))
elif op == 'startsWith':
results.append(trie.startsWith(arg))
elif op == 'countPrefix':
results.append(trie.countPrefix(arg))
elif op == 'erase':
results.append(trie.erase(arg))
else:
raise ValueError(f'Unknown operation: {op}')
return resultsTime complexity: O(L) per operation, where L is the length of the current word or prefix; equivalently O(sum(len(arguments[i]))) total across the full batch.. Space complexity: O(T), where T is the total number of characters represented by active trie nodes; insert may allocate up to O(L) new nodes for one operation..
Hints
- Store two counters at each node: how many words pass through the node and how many words end there. This makes duplicates and countPrefix easy to support.
- During erase, record the path from the root to the terminal node. After decrementing counts, walk backward and delete child links whose pass count becomes zero.