This question evaluates understanding of sampling from discrete probability distributions, randomized algorithm design, handling unnormalized weights and expected-value analysis for rejection-style approaches, along with trade-offs for scalable implementations.
Given an array weights[0..M-1] representing a discrete distribution over M outcomes, implement a function sampleIndex(weights) that returns an index i with probability proportional to weights[i].
Assume all weights[i] >= 0 and at least one weight is positive.
Follow-ups (handle in the same discussion/solution):
weights
do
not
sum to
1
(they are unnormalized)? Provide at least two ways to handle this.
1
, what is the
expected number of trials
as a function of the total sum?
M
is very large and the probability mass is highly concentrated on a small number of indices, what engineering/algorithmic optimizations would you consider? Discuss trade-offs (time, memory, preprocessing cost, update cost).