This question evaluates a data scientist's competencies in experimental design and causal inference, including defining success metrics and guardrails, choosing randomization units under interference, power and sample-size calculations, variance reduction, heterogeneous treatment effect detection, multiple-testing control, and operational data-quality and rollout criteria. It is commonly asked in Analytics & Experimentation interviews because organizations must justify randomized evaluations for features with network effects and operational constraints; the domain is Analytics & Experimentation and the level is primarily practical application grounded in conceptual statistical understanding.
You must evaluate a core product change that likely has network effects (e.g., a matchmaking tweak in a large online game with 8M DAU). Define the primary success metric and guardrails (e.g., D1/D7 retention, ARPDAU, crash rate), choose the randomization unit (user, session, or cluster), and justify it under interference risk. Provide a full test plan: pre-registration, ramp strategy, stopping rules (sequential/alpha spending), power/MDE targets, and duration. Specify variance reduction (e.g., CUPED with pre-period engagement), outlier handling, novelty decay checks, and spillover diagnostics. Compute the required per-variant sample size for a baseline D1 retention of 40% targeting a +1.0pp absolute lift at α=0.05 and power=0.80, and state your formula/assumptions. Detail how you’ll detect heterogeneous treatment effects (cohorts like geo, payer status, device), manage multiple testing (FDR), and what you’ll do if randomization is infeasible (e.g., diff-in-diff with parallel trends checks). Finally, define explicit ship/rollback criteria, data quality SLOs, and how results will be communicated asynchronously to stakeholders in a remote-first environment.