Weighted Random Sampling by Index
Company: Tubitv
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question tests a candidate's ability to implement weighted random sampling using prefix sums and binary search, evaluating practical algorithm design for probability distributions. It assesses proficiency in translating mathematical probability constraints into efficient data structures, a skill commonly probed in machine learning engineering roles where non-uniform sampling is fundamental to model training and inference pipelines.
Constraints
- 1 <= n = len(weights) <= 10^5
- 1 <= weights[i] <= 10^5
- 0 <= target < sum(weights)
- The total sum of weights fits in a 64-bit integer.
Examples
Input: ([1, 3], 0)
Expected Output: 0
Explanation: Prefix sums [1, 4]. target=0 < 1, so it lands in bucket 0 (range [0,1)).
Input: ([1, 3], 1)
Expected Output: 1
Explanation: Prefix sums [1, 4]. target=1 is the start of bucket 1 (range [1,4)), so index 1.
Input: ([1, 3], 3)
Expected Output: 1
Explanation: target=3 is still within bucket 1's range [1,4), so index 1. (sum is 4, valid targets are 0..3.)
Input: ([1, 2, 3, 4], 0)
Expected Output: 0
Explanation: Prefix sums [1,3,6,10]. target=0 is in bucket 0 (range [0,1)).
Input: ([1, 2, 3, 4], 2)
Expected Output: 1
Explanation: target=2 falls in bucket 1's range [1,3), so index 1.
Input: ([1, 2, 3, 4], 3)
Expected Output: 2
Explanation: target=3 is the start of bucket 2's range [3,6) — the boundary case the <= comparison must handle, so index 2.
Input: ([1, 2, 3, 4], 6)
Expected Output: 3
Explanation: target=6 is the start of bucket 3's range [6,10), so index 3 (which should be picked ~40% of the time).
Input: ([1, 2, 3, 4], 9)
Expected Output: 3
Explanation: target=9 is the largest valid target (sum=10), still in bucket 3's range [6,10), so index 3.
Input: ([5], 0)
Expected Output: 0
Explanation: Single weight: every target maps to index 0.
Input: ([5], 4)
Expected Output: 0
Explanation: Single weight, target=4 < 5: still index 0. Exercises the lo==hi base case of the binary search.
Input: ([2, 2, 2], 0)
Expected Output: 0
Explanation: Equal weights, prefix [2,4,6]. target=0 in bucket 0 ([0,2)).
Input: ([2, 2, 2], 2)
Expected Output: 1
Explanation: target=2 is the boundary into bucket 1 ([2,4)), so index 1.
Input: ([2, 2, 2], 5)
Expected Output: 2
Explanation: target=5 in bucket 2 ([4,6)), so index 2.
Hints
- Precompute prefix sums P where P[i] = weights[0] + ... + weights[i]. The index i 'owns' the half-open range [P[i-1], P[i]) of target values (with P[-1] = 0).
- You want the smallest index i with P[i] > target. This is a lower_bound / first-true binary search, not an exact-match search — the classic off-by-one trap is using `<` vs `<=` in the comparison.
- Use the invariant `while lo < hi: mid = (lo+hi)//2; if P[mid] <= target: lo = mid+1 else: hi = mid`, then return lo. This converges lo to the first bucket strictly above target without needing a separate found/not-found case.