A/B Test: Reduce Mobile Live‑Stream Pre‑Roll Ad Frequency by 20%
Context: You are designing an experiment on mobile live streams to evaluate reducing pre‑roll ad frequency by 20% and its effect on user experience and downstream value. Viewers frequently switch streams, and creators share overlapping audiences. Answer all parts precisely.
-
Randomization
-
Choose the randomization unit (viewer‑level, creator‑level, geo, or hybrid).
-
Justify your choice to minimize interference when viewers switch streams and when creators share audiences.
-
Define exposure and how you will ensure assignment stickiness across days.
-
Metrics
-
Pick a single primary metric that captures business value (e.g., watch_time_per_viewer_day).
-
List at least three guardrail metrics (e.g., crash_rate, rebuffer_ratio, ad_impressions_per_viewer, retention_day1) and explain their role.
-
Explain how you will handle heavy tails (e.g., log‑transform, winsorize at 99.5%, or quantile metrics) and how that affects inference.
-
Sample Size
Assumptions:
-
Baseline mean daily watch time = 36 minutes
-
Standard deviation = 60 minutes per viewer‑day
-
Intra‑user correlation induces a design effect of 1.3
-
Total eligible daily mobile viewers = 5,000,000
-
Two‑sided α = 0.05, power 1−β = 0.8
-
Detect a +2% relative lift in the primary metric
Compute the required per‑variant sample size in viewer‑days after applying the design effect. Show formulas and a numeric answer.
-
Variance Reduction
-
Describe how to use CUPED with pre‑experiment watch time and device to reduce variance.
-
Provide the exact regression you would fit and how you would compute/apply θ (theta).
-
Novelty and Ramp
-
Propose a 2‑week ramp with sequential monitoring that controls type I error (e.g., group sequential or alpha‑spending).
-
Specify decision boundaries or stopping rules at interim checks and how you would adjust for peeking.
-
Integrity
-
Describe how you would detect and mitigate bot/AFK traffic and creator‑led raids that could bias results.
-
Include filters and either post‑stratification or cluster‑robust standard errors when clustering by creator/day.
-
Explain how you would check for spillovers and, if detected, how you would switch to a cluster‑randomized test by creator, including power implications.