Design high-throughput hashing for kernels
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of cache-efficient, high-throughput hash lookup design and related competencies such as hashing strategies, probe behavior, memory layout, vectorization, bitwise optimizations, prefetching, alignment, and microarchitectural performance trade-offs.
Constraints
- 0 <= len(pairs), len(queries) <= 2 * 10^5
- Keys are 32-bit signed integers
- Values are integers in the range [0, 10^9]
- Capacity is the smallest power of two >= ceil(10 * len(pairs) / 7), with minimum 1
Examples
Input: ([(1, 10), (9, 20), (17, 30)], [1, 17, 2])
Expected Output: (8, [10, 30, -1], 6, 7)
Explanation: Capacity is 8. Keys 1, 9, and 17 all start at slot 1 and are placed at slots 1, 2, and 3. Insert probes = 1 + 2 + 3 = 6. Lookups use the same probe sequence, giving results [10, 30, -1] and lookup probes 1 + 3 + 3 = 7.
Input: ([(5, 1), (13, 2), (5, 7)], [5, 13, 21])
Expected Output: (8, [7, 2, -1], 4, 6)
Explanation: Capacity is 8. Key 5 is inserted, 13 collides and moves to the next slot, then key 5 appears again and overwrites its existing value. Insert probes = 1 + 2 + 1 = 4. Query results are [7, 2, -1] with lookup probes 1 + 2 + 3 = 6.
Input: ([(-1, 4), (7, 8), (-9, 6)], [-9, -1, 15])
Expected Output: (8, [6, 4, -1], 6, 8)
Explanation: With capacity 8, -1, 7, and -9 all start at slot 7 and wrap around to slots 7, 0, and 1. Insert probes = 1 + 2 + 3 = 6. Lookups return [6, 4, -1] with lookup probes 3 + 1 + 4 = 8.
Input: ([], [1, -1])
Expected Output: (1, [-1, -1], 0, 2)
Explanation: An empty table still has minimum capacity 1. No inserts are performed. Each lookup examines the only empty slot once, so lookup probes = 2.
Solution
def solution(pairs, queries):
n = len(pairs)
# ceil(10 * n / 7) without floating point
required = (10 * n + 6) // 7
capacity = 1
target = max(1, required)
while capacity < target:
capacity <<= 1
mask = capacity - 1
occupied = [False] * capacity
keys = [0] * capacity
values = [0] * capacity
insert_probes = 0
for key, value in pairs:
idx = ((key * 2654435761) & 0xFFFFFFFF) & mask
while True:
insert_probes += 1
if not occupied[idx]:
occupied[idx] = True
keys[idx] = key
values[idx] = value
break
if keys[idx] == key:
values[idx] = value
break
idx = (idx + 1) & mask
answers = []
lookup_probes = 0
for key in queries:
idx = ((key * 2654435761) & 0xFFFFFFFF) & mask
while True:
lookup_probes += 1
if not occupied[idx]:
answers.append(-1)
break
if keys[idx] == key:
answers.append(values[idx])
break
idx = (idx + 1) & mask
return (capacity, answers, insert_probes, lookup_probes)Time complexity: Average O(len(pairs) + len(queries)). Space complexity: O(capacity).
Hints
- Because the capacity is a power of two, you can wrap around with & (capacity - 1) instead of using modulo.
- With open addressing, a lookup can stop as soon as it reaches an empty slot, because the key could not appear later in that probe sequence.