Weighted Random Sampling Generator (Streaming)
You are given:
-
A list of distinct integers values.
-
A matching list of nonnegative probabilities (weights) that should sum to 1 within a small tolerance.
Implement a Python solution that supports streaming (potentially infinite) weighted draws, validates inputs, and is numerically robust.
Requirements
-
Input validation
-
values and weights must be the same nonzero length.
-
values must contain distinct integers.
-
weights must be nonnegative; zero-weight items are allowed and must never be sampled.
-
The sum of weights should be 1 within a small tolerance; handle floating-point rounding and renormalize when needed.
-
Generator interface
-
Provide a generator that yields an infinite stream of draws following the specified distribution.
-
Allow deterministic behavior via a random seed.
-
Efficient sampling methods
-
Implement at least one efficient method and optionally both:
-
Prefix sums with binary search (CDF + bisect): O(n) preprocess, O(log n) per sample, O(n) space.
-
Alias method (Vose/Walker): O(n) preprocess, O(1) per sample, O(n) space.
-
Explain time/space trade-offs and when to use each.
-
Numerical robustness
-
Clamp tiny negative weights caused by rounding to zero.
-
Renormalize weights if their sum differs from 1 by more than a tolerance.
-
Ensure the final CDF ends at exactly 1.0 to avoid edge-case misses.
-
Tests
-
Unit tests with:
-
Deterministic checks via random seeding (same seed → same sequence; different seeds → different sequences with high probability).
-
Statistical verification over many samples (e.g., 50k–200k) that empirical frequencies are close to target probabilities.
-
Edge-case validation (zeros, invalid inputs).
Provide clean, well-documented code and explain the design choices and trade-offs.