Evaluate a New Preloading Strategy for a Short‑Video App (New Users)
Context
On 2025‑08‑20, a new preloading strategy was rolled out to 30% of traffic. Among new users (accounts created within the last 7 days), product analytics observed:
-
−6% change in average daily watch time
-
Crash rate decreased by 0.2 percentage points
-
Average initial video start latency improved by 80 ms
Design an end‑to‑end approach to properly evaluate and decide whether to ship, iterate, or roll back.
Tasks
(a) Define primary, secondary, and guardrail metrics. Justify each, propose useful segmentations (e.g., device, network, country, cohort age, entry surface), and specify exact formulas and units.
(b) Outline an A/B test plan: unit of randomization, bucketing, exposure rules, test length, and how to handle heavy‑tailed watch time (e.g., winsorization, log‑transform, robust estimators).
(c) Estimate the per‑variant sample size to detect a +3% lift in mean daily watch time with α = 0.05 (two‑sided) and 80% power, assuming baseline mean = 14 min, SD = 18 min, independent users, and equal allocation. Show the formula and any additional assumptions if using nonparametric tests.
(d) Specify guardrails (e.g., crash rate, time‑to‑first‑frame, data usage) and stopping rules. Describe how to handle novelty effects, weekday/seasonality, and experiment mis‑randomization checks (e.g., A/A, covariate balance tests).
(e) If the feature was partially rolled out by region before the test, propose a difference‑in‑differences or CUPED/regression‑adjusted analysis. State key identifying assumptions and how you would validate them.