This question evaluates a data scientist's competency in experimental design, causal inference, metric selection and definition, segmentation, and statistical power/sample-size calculations within the Analytics & Experimentation domain and product analytics for user engagement.

On 2025‑08‑20, a new preloading strategy was rolled out to 30% of traffic. Among new users (accounts created within the last 7 days), product analytics observed:
Design an end‑to‑end approach to properly evaluate and decide whether to ship, iterate, or roll back.
(a) Define primary, secondary, and guardrail metrics. Justify each, propose useful segmentations (e.g., device, network, country, cohort age, entry surface), and specify exact formulas and units.
(b) Outline an A/B test plan: unit of randomization, bucketing, exposure rules, test length, and how to handle heavy‑tailed watch time (e.g., winsorization, log‑transform, robust estimators).
(c) Estimate the per‑variant sample size to detect a +3% lift in mean daily watch time with α = 0.05 (two‑sided) and 80% power, assuming baseline mean = 14 min, SD = 18 min, independent users, and equal allocation. Show the formula and any additional assumptions if using nonparametric tests.
(d) Specify guardrails (e.g., crash rate, time‑to‑first‑frame, data usage) and stopping rules. Describe how to handle novelty effects, weekday/seasonality, and experiment mis‑randomization checks (e.g., A/A, covariate balance tests).
(e) If the feature was partially rolled out by region before the test, propose a difference‑in‑differences or CUPED/regression‑adjusted analysis. State key identifying assumptions and how you would validate them.
Login required