Sample index from probability distribution
Company: LinkedIn
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
You are given a discrete probability distribution for an \(M\)-sided die as an array `p[0..M-1]`, where each `p[i]` is non-negative.
Design a function `sample(p) -> int` that returns a random index `i` such that:
- \(\Pr(sample(p)=i) = p[i] / \sum_j p[j]\)
### Requirements
- The function will be called many times with the same `p`.
- Aim for \(O(\log M)\) time per sample after preprocessing.
### Follow-up
What if the probabilities do **not** sum to 1? Explain how your approach handles this.
### Clarifications
- Assume you have access to a random number generator that can produce a uniform float in \([0,1)\) or a uniform integer in a range.
- If \(\sum_j p[j] = 0\), you may define behavior (e.g., throw an error).
Quick Answer: This question evaluates proficiency in randomized algorithms and probability-based sampling, along with algorithmic preprocessing and data-structure design for efficient repeated sampling.
Sample indices from non-negative weights using explicit uniform tickets and binary search over cumulative sums.
Constraints
- Inputs are Python literals matching the function signature.
- Return a deterministic exact-match value.
Examples
Input: ([0.2,0.3,0.5], [0.0,0.2,0.51,0.99])
Expected Output: [0, 1, 2, 2]
Explanation: Cumulative sampling with explicit uniform tickets.
Input: ([2,3], [0.0,0.4,0.99])
Expected Output: [0, 1, 1]
Explanation: Weights need not sum to one.
Hints
- Clarify edge cases before coding.
- Keep the return value deterministic.