You must evaluate a core product change that likely has network effects (e.g., a matchmaking tweak in a large online game with 8M DAU). Define the primary success metric and guardrails (e.g., D1/D7 retention, ARPDAU, crash rate), choose the randomization unit (user, session, or cluster), and justify it under interference risk. Provide a full test plan: pre-registration, ramp strategy, stopping rules (sequential/alpha spending), power/MDE targets, and duration. Specify variance reduction (e.g., CUPED with pre-period engagement), outlier handling, novelty decay checks, and spillover diagnostics. Compute the required per-variant sample size for a baseline D1 retention of 40% targeting a +1.0pp absolute lift at α=0.05 and power=0.80, and state your formula/assumptions. Detail how you’ll detect heterogeneous treatment effects (cohorts like geo, payer status, device), manage multiple testing (FDR), and what you’ll do if randomization is infeasible (e.g., diff-in-diff with parallel trends checks). Finally, define explicit ship/rollback criteria, data quality SLOs, and how results will be communicated asynchronously to stakeholders in a remote-first environment.