Analyze a geo rollout and interpret charts
Company: Pinterest
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
You launch a new onboarding flow on 2025-07-15 only in Texas and Florida. A week later, execs notice a 3% nationwide DAU dip on a line chart. Design an analysis and answer precisely:
1) Causal question: Did the feature cause a DAU change in treated states? Specify estimand and write the difference-in-differences (DiD) formula explicitly.
2) Using the following summary (daily averages, pre = 2025-06-15..2025-07-14, post = 2025-07-15..2025-08-14): Texas pre=500k, post=515k; Florida pre=300k, post=303k; Control pool (other states) pre=2,000k, post=2,060k. Compute DiD for each treated state and combined (population-weighted). Interpret sign and magnitude.
3) The chart also shows a weekend trough pattern and a visible break on 2025-07-20. Outline a segmented regression on the time series to quantify immediate level change and slope change for treated vs control. Include the regression equation and how you’d cluster standard errors.
4) Guardrail metrics: propose at least three (e.g., crash rate, latency p95, payment decline rate). Define decision thresholds and which are one-sided vs two-sided.
5) Power and duration: with average daily DAU in treated = 815k combined, MDE = 0.5% relative on DAU, alpha=0.05, power=0.8, estimate required days under a parallel-trends DiD. State assumptions and whether CUPED or synthetic controls would reduce required duration.
6) The chart is noisy around 2025-07-27 after a marketing campaign in California. Explain how you’d validate the parallel trends assumption and choose a donor pool or weights to mitigate spillovers.
7) Provide a brief go/no-go recommendation and the exact additional data you’d request to de-risk the decision.
Quick Answer: This question evaluates causal inference and product analytics competencies — specifically specification of causal estimands and difference-in-differences, segmented (interrupted) time-series regression, power and duration calculations, spillover and donor-pool diagnostics, and selection of guardrail metrics.