Design and analyze ad A/B test
Company: Capital One
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Take-home Project
You are testing a new ad-ranking algorithm (B) against the current production system (A) on an online video platform. Primary metric: mean watch_time per impression (seconds). Guardrails: (1) error rate ≤ 1%, (2) ad load (ads per session) must not increase, (3) click-through rate (CTR) must not decrease by more than 2% relative. Traffic is 50/50 A/B and randomized at user level for 14 days. Seasonality is weekly and there is a known weekday/weekend effect. Data available daily: impressions, total_watch_time_sec, clicks, sessions, errors by variant and platform (Web, Mobile). Design the experiment analysis plan: (a) State H0/HA and justify whether a one-tailed or two-tailed test is appropriate for the primary metric; (b) Specify the exact test for the primary metric (e.g., two-sample t-test on per-user means, CUPED with a covariate, or cluster-robust approach) and justify assumptions and clustering; (c) Define the variance reduction strategy you would use (e.g., CUPED using pre-experiment watch_time) and how you would compute it; (d) Show how you will check guardrails with multiplicity control (e.g., Holm-Bonferroni), and what decision rule you will use if a guardrail is violated; (e) Describe stratification/segmentation you will pre-register (e.g., by platform and weekday/weekend) and how you will combine strata (fixed vs random effects meta-analysis); (f) Provide a power/MDE calculation sketch assuming baseline mean=70s, sd=25s at user-day level, average 4 impressions/user-day, intra-user correlation 0.35, 200k users per arm over 14 days; (g) Explain how you would diagnose and mitigate traffic imbalance or novelty effects.
Quick Answer: This question evaluates a candidate's competency in online experimentation and statistical analysis, covering hypothesis formulation, variance reduction, clustering and stratification, multiplicity control, power/MDE calculation, and operational metric guardrails.