PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's competency in streaming algorithms, per-user reservoir sampling, probabilistic reasoning and proof of correctness, concurrent/distributed merging, and engineering trade-offs for memory and amortized update efficiency.

  • Medium
  • Stripe
  • Coding & Algorithms
  • Data Scientist

Implement streaming per-user reservoir sampling

Company: Stripe

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Design and code (in Python) a streaming algorithm that ingests an unbounded event stream of tuples (user_id, event_time, event_type) and maintains, for each user, at most M events that are a uniform random sample without replacement of all events seen so far for that user. Requirements: (1) Use reservoir sampling so each event has equal inclusion probability; prove correctness. (2) Achieve amortized O(1) update per event and O(U*M) memory, where U is number of active users. (3) Support concurrent shards with deterministic merging given a random seed. (4) Provide unit tests that verify marginal inclusion probabilities and that increasing M reduces variance of feature estimates derived from the reservoir. (5) As an extension, maintain a real-time top-K users by event count using a size-K min-heap that supports increment/decrement when late events are revoked; state time and space complexities.

Quick Answer: This question evaluates a candidate's competency in streaming algorithms, per-user reservoir sampling, probabilistic reasoning and proof of correctness, concurrent/distributed merging, and engineering trade-offs for memory and amortized update efficiency.

Maintain at most M sampled events per user using reservoir sampling with a deterministic seed for exact grading.

Constraints

  • Inputs are Python literals matching the function signature.
  • Return a deterministic exact-match value.

Examples

Input: ([("u1",1,"a"),("u1",2,"b"),("u1",3,"c"),("u2",1,"x")], 2, 42)

Expected Output: {'u1': [[1, 'a'], [2, 'b']], 'u2': [[1, 'x']]}

Explanation: Reservoir keeps at most M events per user with seeded randomness.

Input: ([], 3, 1)

Expected Output: {}

Explanation: Empty stream has no samples.

Hints

  1. Clarify edge cases before coding.
  2. Keep the return value deterministic.
Last updated: Jun 27, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Assign Reviewers from Changed Files - Stripe (medium)
  • Generate Account Email Notifications - Stripe (medium)
  • Calculate Transaction Fees - Stripe (medium)
  • Build an Account Transfer Ledger - Stripe (medium)
  • Implement Validation and String Compression - Stripe (hard)