This question evaluates a candidate's competency in online experimentation and statistical analysis, covering hypothesis formulation, variance reduction, clustering and stratification, multiplicity control, power/MDE calculation, and operational metric guardrails.

Context: You are evaluating a new ad-ranking algorithm (B) against the current production system (A) on an online video platform. Randomization is 50/50 at the user level, run for 14 days. Primary metric is mean watch_time per impression (seconds). Weekly seasonality and weekday/weekend effects are known. Daily data by variant and platform (Web, Mobile): impressions, total_watch_time_sec, clicks, sessions, errors.
Guardrails:
Tasks: (a) State H0/HA for the primary metric and justify one-tailed vs. two-tailed. (b) Specify the exact test for the primary metric and justify assumptions and clustering. (c) Define the variance reduction strategy (e.g., CUPED) and how you will compute it. (d) Show how you will check guardrails with multiplicity control and the decision rule if violated. (e) Describe stratification/segmentation to pre-register (e.g., by platform and weekday/weekend) and how to combine strata (fixed vs. random effects). (f) Provide a power/MDE sketch with: baseline mean=70s, sd=25s at user-day level, avg 4 impressions/user-day, intra-user correlation=0.35, 200k users per arm over 14 days. (g) Explain how you will diagnose and mitigate traffic imbalance or novelty effects.
Login required