Design robust metrics for a feature launch
Company: TikTok
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You are launching a new in-app Quick Reply feature in a messaging app and must define metrics and guardrails for a 1-week A/B test running 2025-08-25 to 2025-09-01. Be precise about denominators, units of analysis, and attribution windows.
1) Define the primary success metric (north-star) that captures meaningful user value from Quick Reply. Provide the exact formula, including: unit (user or user-day), numerator, denominator, inclusion criteria (e.g., exposure definition), and a 24-hour attribution rule from click to reply send.
2) Propose at least two guardrail metrics that protect long-term health (e.g., reply quality/abuse rate, churn, app crashes). For each, specify measurement unit, formula, and acceptable movement thresholds.
3) Suppose overall reply send rate increases, but average conversation length decreases and complaint rate from Ads-acquired users rises. Explain how you would segment and interpret the metrics to avoid Simpson’s paradox and decide whether to ship, hold, or iterate.
4) Define a metric to detect “empty engagement” (e.g., accidental taps or replies that are deleted before send). Describe how to implement it using existing telemetry and how it would influence the launch decision.
5) Lay out a pre-analysis plan: how you will pre-register primary/secondary metrics, define stopping rules, and prevent metric fishing. Include how you would validate that “exposed” truly means the user saw the Quick Reply entry point (not just was eligible).
Quick Answer: This question evaluates a data scientist's ability to define robust A/B test metrics and guardrails, specify precise units of analysis, denominators, exposure and attribution windows, detect empty or accidental engagement via telemetry, and pre-register a valid analysis plan.