Apply bitwise tricks for performance
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency with bitwise operations, branchless programming, low-level performance optimizations, and microarchitectural reasoning about pipeline stalls, instruction-level parallelism, and cache behavior.
Constraints
- 0 <= len(nums) <= 200000
- 0 <= nums[i] < 2^63
- 1 <= bucket_count <= 65536, and bucket_count is a power of two
- 1 <= align <= 1048576, and align is a power of two
- 0 <= threshold <= 63
Examples
Input: ([5, 8, 3], 4, 4, 2)
Expected Output: [-1, -2, 0, -2]
Explanation: 5 has popcount 2 and is unaligned, so bucket 1 gets -2. 8 has popcount 1, is aligned, and contributes -1 to bucket 0. 3 has popcount 2 and is unaligned, so bucket 3 gets -2.
Input: ([4, 7, 12, 2], 8, 4, 2)
Expected Output: [0, 0, 1, 0, 1, 0, 0, -3]
Explanation: 4 contributes -1 to bucket 4, 7 contributes -3 to bucket 7, 12 contributes +2 to bucket 4, and 2 is unaligned with popcount 1 so it flips sign and contributes +1 to bucket 2.
Input: ([], 2, 2, 1)
Expected Output: [0, 0]
Explanation: No numbers are processed, so every bucket remains 0.
Input: ([0, 1, 2, 3, 4, 4], 4, 2, 1)
Expected Output: [2, -1, 1, -2]
Explanation: This case includes zero and duplicates. Zero contributes 0. The two copies of 4 both map to bucket 0 and each add +1.
Input: ([15, 16, 31, 32], 8, 16, 4)
Expected Output: [-2, 0, 0, 0, 0, 0, 0, -9]
Explanation: 15 and 31 both have high popcount but are not 16-aligned, so they contribute -4 and -5 to bucket 7. 16 and 32 are aligned but have popcount below the threshold, so they contribute -1 each to bucket 0.
Input: ([1], 1, 1, 0)
Expected Output: [1]
Explanation: With one bucket, every value maps to bucket 0. With align = 1, every value is aligned. Since threshold = 0, the popcount is always taken as positive.
Solution
def solution(nums, bucket_count, align, threshold):
# Branchless select: returns a if cond else b.
# If cond is True, mask is -1; otherwise it is 0.
# Then b ^ ((a ^ b) & mask) picks a or b without an if-statement.
def select(cond, a, b):
mask = -int(cond)
return b ^ ((a ^ b) & mask)
scores = [0] * bucket_count
bucket_mask = bucket_count - 1
align_mask = align - 1
# In lower-level code, these idioms avoid division, reduce unpredictable
# branches, improve pipeline usage and ILP, and keep the hot scores array
# cache-friendly.
for x in nums:
# Because bucket_count is a power of two, masking is equivalent to modulo.
bucket = x & bucket_mask
# Population count.
ones = x.bit_count()
# signed = ones if ones >= threshold else -ones
signed = select(ones >= threshold, ones, -ones)
# x is aligned iff its low log2(align) bits are all zero.
# delta = signed if aligned else -signed
delta = select((x & align_mask) == 0, signed, -signed)
scores[bucket] += delta
return scoresTime complexity: O(n + bucket_count). Space complexity: O(bucket_count).
Hints
- Because `bucket_count` is a power of two, `x % bucket_count` can be replaced by a bit mask on the low bits.
- Think of each `if` as choosing between two values. A boolean can be turned into a mask and used for a branchless select.