Design dynamic weighted random sampling with updates
Company: Citadel
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's ability to design and analyze dynamic data structures that support weighted random sampling with inserts and deletes, testing skills in algorithm design, complexity analysis, and randomized sampling under large numeric constraints.
Constraints
- 1 <= len(operations) <= 100000
- 1 <= weight <= 10^9 for insert operations
- IDs are integers and may be sparse or large in magnitude
- The total active weight can exceed 32-bit integer range
- Insert on an existing ID must be treated as a weight update
- Delete on a missing ID is a no-op
Examples
Input: [("insert", 10, 4), ("insert", 20, 6), ("sample", 1), ("sample", 4), ("sample", 5), ("sample", 10)]
Expected Output: [10, 10, 20, 20]
Explanation: In increasing ID order, 10 owns [1,4] and 20 owns [5,10].
Input: [("insert", 5, 3), ("insert", 2, 2), ("sample", 4), ("insert", 5, 1), ("sample", 2), ("delete", 2), ("sample", 1), ("delete", 5), ("sample", 1)]
Expected Output: [5, 2, 5, -1]
Explanation: Insert on existing ID updates its weight. After all deletions, sampling from an empty set returns -1.
Input: [("delete", 7), ("insert", 7, 1000000000), ("sample", 1), ("sample", 1000000000), ("insert", 3, 1000000000), ("sample", 1000000000), ("sample", 1000000001)]
Expected Output: [7, 7, 3, 7]
Explanation: This checks large weights and boundary values. After inserting 3, IDs are ordered as 3 then 7.
Input: [("insert", 1, 5), ("insert", 4, 2), ("insert", 3, 4), ("delete", 4), ("sample", 8), ("insert", 4, 10), ("sample", 9), ("insert", 1, 1), ("sample", 1), ("sample", 5)]
Expected Output: [3, 3, 1, 3]
Explanation: This case mixes inserts, updates, deletes, and boundary prefix sums.
Input: [("sample", 1), ("sample", 2)]
Expected Output: [-1, -1]
Explanation: Sampling from an always-empty structure returns -1 each time.
Solution
def solution(operations):
ids = sorted({op[1] for op in operations if op[0] in ("insert", "delete")})
n = len(ids)
index = {item_id: i + 1 for i, item_id in enumerate(ids)}
bit = [0] * (n + 1)
current = {}
total_weight = 0
def add(i, delta):
nonlocal total_weight
total_weight += delta
while i <= n:
bit[i] += delta
i += i & -i
def find_by_prefix(k):
idx = 0
step = 1 << (n.bit_length() - 1)
while step:
nxt = idx + step
if nxt <= n and bit[nxt] < k:
idx = nxt
k -= bit[nxt]
step >>= 1
return idx + 1
answer = []
for op in operations:
typ = op[0]
if typ == "insert":
_, item_id, weight = op
old = current.get(item_id, 0)
current[item_id] = weight
add(index[item_id], weight - old)
elif typ == "delete":
_, item_id = op
old = current.pop(item_id, 0)
if old:
add(index[item_id], -old)
else: # sample
_, r = op
if total_weight <= 0 or r < 1 or r > total_weight:
answer.append(-1)
else:
pos = find_by_prefix(r)
answer.append(ids[pos - 1])
return answerTime complexity: O(m log m) overall for m operations, with O(log m) per update/sample after coordinate compression. Space complexity: O(m).
Hints
- Compress item IDs into fixed indices, then store current weights in a Fenwick tree or segment tree.
- A sample query is equivalent to finding the first index whose prefix-sum weight is at least r.