A/B Test: Increasing Video Pins for New Users
Context
Pinterest ran an online controlled experiment on new users to increase the share of video pins in the home feed. Users were randomized 50/50 into Treatment and Control. The evaluation window is 14 days. The primary metric is 7-day time spent per user (minutes), evaluated with a one-sided test (Treatment > Control) and a planning MDE of +2% relative to Control. Secondary metrics include CTR (clicks/impressions) and D7 retention (users retained on day 7 / assigned users).
Given Data
-
Assignments (users): Control = 98,750; Treatment = 101,250
-
Primary metric (per-user): Control mean = 12.00, sd = 8.00; Treatment mean = 12.36, sd = 8.00
-
CTR (aggregate): Control = 150,000 clicks / 5,000,000 impressions; Treatment = 171,600 clicks / 5,200,000 impressions
-
D7 retention: Control = 21,725 / 98,750; Treatment = 22,680 / 101,250
Tasks
-
State H0 and H1 for the primary metric precisely, including direction and the MDE.
-
Check for sample ratio mismatch (SRM) using a chi-square test at α = 0.001. Should SRM be suspected given the observed assignments for an intended 50/50 split?
-
For the primary metric, compute the absolute lift, relative lift (%), a 95% CI for the difference in means, and the one-sided p-value. Is the result significant at α = 0.05?
-
For CTR and D7 retention, run appropriate two-proportion tests and adjust for multiple comparisons across these two secondary metrics using Holm–Bonferroni at familywise α = 0.05. Which, if any, remain significant?
-
Provide a ship/no-ship recommendation. If you detect SRM or other validity threats, discuss their impact and list additional diagnostics/guardrails you would run before launching.