Diagnose and decide on watch-time drop
Company: Disney
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: HR Screen
On 2025-08-25, Hulu shipped a new homepage ranking model to 100% of iOS traffic. Between 2025-08-25 and 2025-08-31, average watch-time per session among new users on iOS fell by 2.7%, while Android was flat. A major content premiere occurred on 2025-08-29, and there was a 20-minute payments outage on 2025-08-27 affecting subscription starts. No holdout was configured. Within 48 hours, recommend whether to roll back, continue, or mitigate.
Answer the following:
- Define the primary decision metric(s) and guardrails (e.g., session starts, error rate, churn proxy). Set concrete decision thresholds.
- Propose an identification strategy without a built-in holdout: (1) a difference-in-differences using Android as a control with a pre-period 2025-08-18–2025-08-24; (2) a geo-based synthetic control within iOS. Specify assumptions (parallel trends, no interference), diagnostics, and falsification tests.
- Adjust for seasonality, the 2025-08-29 premiere, and the 2025-08-27 outage. What fixed effects or controls do you include? How do you handle heterogeneity by country, platform version, and new vs returning users?
- Quantify impact with confidence intervals and provide a back-of-the-envelope weekly revenue effect (state assumptions for ad- and subscription-revenue paths).
- Outline a fast remediation plan (e.g., revert, throttle, feature flags, guardrail monitors), and a forward plan for a proper experiment design in September (holdouts, pre-registration, CUPED, exposure logging).
Quick Answer: This question evaluates a data scientist's skills in causal inference, experiment analytics, metric definition and guardrails, confounder adjustment, and rapid decision-making under time pressure.