Design and validate an ads feed experiment
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
You are the first data scientist on an ad‑supported mobile news app. Revenue comes from ad impressions and click‑throughs. Baselines: DAU=2,000,000 (US 40%, EU 30%, RoW 30%), sessions/user/day=3, ads/session=6, feed CTR=8% (per session), D1 retention=35%, D7=18%. Product proposes a redesigned feed ranking and UI. Design and validate one experiment, answering exactly:
1) Define a KPI tree aligned to the business goal of maximizing long‑term ad revenue without harming engagement. Choose one primary metric and 3–5 guardrails; justify each and state expected direction of change.
2) Traffic split is 90/10 (control/test) and 10 metrics will be monitored. Compute the minimum sample size per arm to detect a 5% relative lift on the primary metric (CTR baseline 8%), two‑sided α=0.05, power=0.80. Then recompute under (a) Bonferroni across 10 outcomes and (b) Benjamini–Hochberg at FDR=5% (state assumptions) and discuss trade‑offs.
3) Given the above traffic and that new users are excluded from randomization for their first 24 hours, estimate experiment duration to hit the required sample in US only and globally. Show formulas and numeric estimates.
4) List and operationalize pre‑launch sanity checks (e.g., sample ratio mismatch tolerance bands, covariate balance, metric distribution stability), and post‑launch analyses (CUPED with a 14‑day pre‑period, variance reduction choices, delta method vs. bootstrap CIs).
5) The team is worried about novelty effect and network interference. Propose a ramp plan (e.g., 1%→10%→50%→90%), a 7‑day washout/learning period policy, and a randomization unit (user vs. region vs. cluster) that mitigates interference. Explain how you’d detect and correct novelty and interference statistically.
6) Outline a sequential monitoring plan (e.g., α‑spending or group‑sequential boundaries). What would you do if SRM is detected at 10% absolute deviation? Provide a concrete diagnostic checklist and a rollback decision rule.
7) If multiple concurrent experiments touch the feed, explain how you’d avoid cross‑experiment contamination (e.g., factorial design, orthogonal bucketing) and when you’d prefer a geo‑experiment instead of user‑level randomization.
Quick Answer: This question evaluates a data scientist's competency in experiment design, causal inference, and applied statistical analysis for product experimentation, covering KPI definition, sample-size and power calculations, multiple-comparison adjustments, pre/post-launch validation, variance-reduction techniques, ramping and randomization choices, interference detection, and sequential monitoring; it falls under the Analytics & Experimentation domain and tests both conceptual understanding and practical application. It is commonly asked because interviewers need evidence that a candidate can align experimentation with business goals (maximizing long-term ad revenue while protecting engagement), translate operational constraints into duration and traffic estimates, and reason about statistical trade-offs and operational safeguards such as ramp plans and cross-experiment contamination controls.