PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Coding & Algorithms/TikTok

Implement streaming SRM detector with late events

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in stateful stream processing, statistical anomaly detection for sample ratio mismatch, and scalable system design concerns such as deduplication, late/out-of-order event handling, real-time aggregation, chi-square testing with Yates correction, bot-rate mitigation, sharding, and replay validation.

  • Medium
  • TikTok
  • Coding & Algorithms
  • Data Scientist

Implement streaming SRM detector with late events

Company: TikTok

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: HR Screen

Implement a streaming detector for sample ratio mismatch (SRM) across many concurrent experiments. Input is two topic-partitioned streams: assignments (experiment_id, user_id, variant, ts) and pageviews (user_id, ts). Requirements: 1) deduplicate per (experiment_id, user_id) using idempotent state; 2) maintain rolling counts per variant in O(1) memory per experiment (no raw buffering), supporting late/out-of-order events up to 24 hours; 3) every minute, compute a chi-square goodness-of-fit with Yates correction versus the target split, raise an alert if p < 1e-4 and absolute diff ≥ 0.3 percentage points; 4) guard against bot bursts by excluding users with >N assignments/min; 5) complexity and pseudocode for a single-threaded worker and how you’d shard it; 6) explain how you would validate the detector in replay without leaking ground truth.

Quick Answer: This question evaluates proficiency in stateful stream processing, statistical anomaly detection for sample ratio mismatch, and scalable system design concerns such as deduplication, late/out-of-order event handling, real-time aggregation, chi-square testing with Yates correction, bot-rate mitigation, sharding, and replay validation.

Related Interview Questions

  • Parse a nested list from a string - TikTok (medium)
  • Implement stacks, streaming median, and upward path sum - TikTok (easy)
  • Maximize sum with no adjacent elements - TikTok (medium)
  • Implement stack variants and path-sum check - TikTok (medium)
  • Find the longest palindromic substring - TikTok (easy)
TikTok logo
TikTok
Oct 13, 2025, 9:49 PM
Data Scientist
HR Screen
Coding & Algorithms
0
0

Implement a streaming detector for sample ratio mismatch (SRM) across many concurrent experiments. Input is two topic-partitioned streams: assignments (experiment_id, user_id, variant, ts) and pageviews (user_id, ts). Requirements: 1) deduplicate per (experiment_id, user_id) using idempotent state; 2) maintain rolling counts per variant in O(1) memory per experiment (no raw buffering), supporting late/out-of-order events up to 24 hours; 3) every minute, compute a chi-square goodness-of-fit with Yates correction versus the target split, raise an alert if p < 1e-4 and absolute diff ≥ 0.3 percentage points; 4) guard against bot bursts by excluding users with >N assignments/min; 5) complexity and pseudocode for a single-threaded worker and how you’d shard it; 6) explain how you would validate the detector in replay without leaking ground truth.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More TikTok•More Data Scientist•TikTok Data Scientist•TikTok Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.