You plan to increase the proportion of video pins surfaced in the home feed. Design a rigorous evaluation and then interpret provided results.
A) Experiment design
-
Specify the unit of randomization (user-level vs. session-level) and justify considering network/content-supply interference and feed-ranking spillovers. State how to cap per-user exposure to the new mix (e.g., from 30% baseline video share to 45% target) and how to ramp.
-
Define primary success metrics with exact formulas (e.g., saves_per_user_day, clicks_per_impression, time_spent_per_user_day) and guardrails (e.g., complaint_rate = complaints/impressions, session_end_rate, creator churn, bandwidth cost per user). State win/loss directions.
-
Outline power/MDE and duration assumptions (alpha, two-sided test, allocation, variance source), and how you will handle sequential looks or peeking (e.g., group sequential or CUPED). Include an A/A check and novelty/fatigue plan (minimum run and long-term holdout).
-
If an RCT is infeasible, propose a credible quasi-experiment (e.g., staggered rollout DiD with user fixed effects + inverse-propensity weighting, or synthetic control). List identifying assumptions, diagnostics, and sensitivity checks.
B) Interpret this 14-day readout (N ≈ 2.0M users; user-level randomization; robust SEs)
metric | control_mean | treatment_mean | lift_% | p_value
CTR | 3.00% | 3.60% | +20.0 | 0.010
avg_session_sec| 310 | 340 | +9.7 | 0.040
7d_retention | 28.0% | 27.0% | -3.6 | 0.070
complaint_rate | 0.50% | 0.65% | +30.0 | 0.030
-
Provide a clear recommendation to the PM: roll out, iterate, or stop? Justify using the metrics above, multiple-testing/guardrail considerations, and potential mitigations (e.g., cap video share for sensitive cohorts, rank-quality filters). Also state what additional data or follow-up analysis you would run before a full rollout.