Feed Ranker A/B Test Design and Powering
You are replacing the current ranker with a new model in a feed. Baseline CTR is 2.0%. You expect a +5% relative lift on CTR and want 90% power at α = 0.05. Daily eligible traffic is 1,000,000 users; assignment is user-level 50/50 and stable over time.
Assume CTR is measured at the user level over the test window (Bernoulli per user: clicked ≥1 vs. not), users are independently and identically distributed within arms, and there is no interference (SUTVA) unless otherwise addressed.
A) Sample Size and Minimum Duration
-
Use a two-proportion power analysis for a two-sided test to detect an absolute lift of 0.1 percentage points (from 2.0% to 2.1%).
-
State and use your variance assumptions and show formulas and steps.
-
Adjust the resulting sample size for (i) 5% expected bot/invalid traffic and (ii) up to 1% sample ratio mismatch (SRM).
-
Compute the minimum test duration given daily traffic and 50/50 assignment.
B) Guardrails and Sequential Monitoring
-
Propose guardrail metrics (bounce rate, crashes, p95 latency, revenue per user) with decision thresholds and how you will test them (e.g., non-inferiority).
-
Describe a sequential monitoring plan using α-spending (e.g., Pocock or O’Brien–Fleming via Lan–DeMets) that allows early stopping without inflating Type I error.
C) Novelty, DOW Effects, and CUPED
-
Address novelty and day-of-week effects.
-
Propose CUPED (covariate adjustment) using pre-experiment user CTR: specify the covariate, the adjustment, and how you would validate variance reduction without bias.
D) Interference and Contamination
-
Mitigate interference: ensure user-level bucketing, prevent cross-arm content spillover, and plan a geo or holdout if network effects are suspected.
E) Post-Test Checks and Ramp
-
After the test, outline checks for achieved power, heterogeneity of treatment effects across cohorts, and how you’d decide to ramp to 100%.