Implement streaming per-user reservoir sampling
Company: Stripe
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's competency in streaming algorithms, per-user reservoir sampling, probabilistic reasoning and proof of correctness, concurrent/distributed merging, and engineering trade-offs for memory and amortized update efficiency.
Constraints
- Inputs are Python literals matching the function signature.
- Return a deterministic exact-match value.
Examples
Input: ([("u1",1,"a"),("u1",2,"b"),("u1",3,"c"),("u2",1,"x")], 2, 42)
Expected Output: {'u1': [[1, 'a'], [2, 'b']], 'u2': [[1, 'x']]}
Explanation: Reservoir keeps at most M events per user with seeded randomness.
Input: ([], 3, 1)
Expected Output: {}
Explanation: Empty stream has no samples.
Hints
- Clarify edge cases before coding.
- Keep the return value deterministic.