PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of cache-efficient, high-throughput hash lookup design and related competencies such as hashing strategies, probe behavior, memory layout, vectorization, bitwise optimizations, prefetching, alignment, and microarchitectural performance trade-offs.

  • Medium
  • Anthropic
  • Coding & Algorithms
  • Software Engineer

Design high-throughput hashing for kernels

Company: Anthropic

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Design a high-throughput hash-based lookup to be called inside a tight kernel. Choose between open addressing and chaining, specify the load factor, probe sequence, and table layout to favor vectorization and contiguous accesses. Show how to use bitwise operations, prefetching, and alignment to reduce collisions and cache misses. Explain trade-offs and how you would profile and validate the design.

Quick Answer: This question evaluates understanding of cache-efficient, high-throughput hash lookup design and related competencies such as hashing strategies, probe behavior, memory layout, vectorization, bitwise optimizations, prefetching, alignment, and microarchitectural performance trade-offs.

You are modeling the core of a hash-based lookup that will be called inside a tight kernel. To favor contiguous memory accesses, the design uses open addressing with linear probing instead of chaining. Implement the exact hash table behavior below. 1. Let n be the number of input pairs. 2. Choose the table capacity as the smallest power of two greater than or equal to ceil(10 * n / 7), with a minimum capacity of 1. 3. Store entries in a contiguous table using open addressing. 4. Use this hash for the initial slot: initial_index = ((key * 2654435761) & 0xFFFFFFFF) & (capacity - 1) 5. On collision, probe linearly: next_index = (index + 1) & (capacity - 1) 6. Insert pairs in order. If a key appears more than once, overwrite the previous value in place. 7. For lookups, return the stored value, or -1 if the key is not found. A probe is counted every time you examine one table slot during insertion or lookup. Return a tuple: (capacity, answers, insert_probes, lookup_probes) where answers is the list of lookup results for the query keys. This problem focuses on simulating a kernel-friendly table layout. In a lower-level language, the same design would naturally support alignment, prefetching, and contiguous accesses.

Constraints

  • 0 <= len(pairs), len(queries) <= 2 * 10^5
  • Keys are 32-bit signed integers
  • Values are integers in the range [0, 10^9]
  • Capacity is the smallest power of two >= ceil(10 * len(pairs) / 7), with minimum 1

Examples

Input: ([(1, 10), (9, 20), (17, 30)], [1, 17, 2])

Expected Output: (8, [10, 30, -1], 6, 7)

Explanation: Capacity is 8. Keys 1, 9, and 17 all start at slot 1 and are placed at slots 1, 2, and 3. Insert probes = 1 + 2 + 3 = 6. Lookups use the same probe sequence, giving results [10, 30, -1] and lookup probes 1 + 3 + 3 = 7.

Input: ([(5, 1), (13, 2), (5, 7)], [5, 13, 21])

Expected Output: (8, [7, 2, -1], 4, 6)

Explanation: Capacity is 8. Key 5 is inserted, 13 collides and moves to the next slot, then key 5 appears again and overwrites its existing value. Insert probes = 1 + 2 + 1 = 4. Query results are [7, 2, -1] with lookup probes 1 + 2 + 3 = 6.

Input: ([(-1, 4), (7, 8), (-9, 6)], [-9, -1, 15])

Expected Output: (8, [6, 4, -1], 6, 8)

Explanation: With capacity 8, -1, 7, and -9 all start at slot 7 and wrap around to slots 7, 0, and 1. Insert probes = 1 + 2 + 3 = 6. Lookups return [6, 4, -1] with lookup probes 3 + 1 + 4 = 8.

Input: ([], [1, -1])

Expected Output: (1, [-1, -1], 0, 2)

Explanation: An empty table still has minimum capacity 1. No inserts are performed. Each lookup examines the only empty slot once, so lookup probes = 2.

Solution

def solution(pairs, queries):
    n = len(pairs)

    # ceil(10 * n / 7) without floating point
    required = (10 * n + 6) // 7

    capacity = 1
    target = max(1, required)
    while capacity < target:
        capacity <<= 1

    mask = capacity - 1
    occupied = [False] * capacity
    keys = [0] * capacity
    values = [0] * capacity

    insert_probes = 0

    for key, value in pairs:
        idx = ((key * 2654435761) & 0xFFFFFFFF) & mask
        while True:
            insert_probes += 1
            if not occupied[idx]:
                occupied[idx] = True
                keys[idx] = key
                values[idx] = value
                break
            if keys[idx] == key:
                values[idx] = value
                break
            idx = (idx + 1) & mask

    answers = []
    lookup_probes = 0

    for key in queries:
        idx = ((key * 2654435761) & 0xFFFFFFFF) & mask
        while True:
            lookup_probes += 1
            if not occupied[idx]:
                answers.append(-1)
                break
            if keys[idx] == key:
                answers.append(values[idx])
                break
            idx = (idx + 1) & mask

    return (capacity, answers, insert_probes, lookup_probes)

Time complexity: Average O(len(pairs) + len(queries)). Space complexity: O(capacity).

Hints

  1. Because the capacity is a power of two, you can wrap around with & (capacity - 1) instead of using modulo.
  2. With open addressing, a lookup can stop as soon as it reaches an empty slot, because the key could not appear later in that probe sequence.
Last updated: Apr 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Banking System - Anthropic (medium)
  • Implement Persistent Memoization LRU Cache - Anthropic (hard)
  • Fix a Corrupted Bootloader Instruction - Anthropic (medium)
  • Implement a Time-Aware Task Manager - Anthropic (hard)
  • Implement a Simplified DNS Resolver - Anthropic (hard)