This question evaluates algorithm design and probabilistic reasoning for weighted random sampling, including data-structure preprocessing, time-space trade-offs, expected versus worst-case performance, and attention to numerical precision.
You are given an array of positive weights w[0..n-1]. Implement a data structure that supports:
init(w)
: preprocess the weights.
pick() -> int
: return an index
i
such that
P(pick() = i) = w[i] / sum(w)
.
pick()
runs in
O(log n)
time after preprocessing.
pick()
will be called
millions of times
, how would you redesign/precompute so that each
pick()
call is
O(1) expected time
(while keeping preprocessing reasonable)?
sum(w)
and floating-point pitfalls if using doubles.