Experiment Design: New Search Ranking Feature
Context
You are designing, running, and analyzing an online controlled experiment to evaluate a new search ranking feature for a consumer app with logged-in users. The feature may exhibit cross-session carryover (users learn or form habits) and could create network/interference effects (e.g., popularity feedback loops, shared caches, or ranking signals that influence others).
Tasks
-
Unit of randomization
-
Choose among user-level, session-level, or query-level randomization.
-
Justify your choice given likely cross-session carryover and potential network/interference.
-
Metrics
-
Define a primary success metric (e.g., query-level search success or downstream conversion within 24 hours), including precise measurement windows and inclusion criteria.
-
Define guardrail metrics (e.g., latency, crash rate, ads revenue if applicable, bounce rate) and how they’ll be monitored.
-
Power and sample size
-
Baseline click-through rate (CTR) is 10%; you seek a relative +2% uplift (to 10.2%). Use a two-sided α = 0.05 and power = 0.8.
-
Show the standard two-proportion z-test sample size formula and compute the required per-variant sample size.
-
Discuss how clustering (e.g., user-level correlation) or CUPED would change the requirement.
-
Execution plan
-
Outline SRM checks; triggered vs. intent-to-treat analyses; bucketing consistency across services; novelty effects/burn-in; and sequential monitoring without inflating Type I error.
-
Heterogeneity
-
Pre-register segments (e.g., head vs. tail queries, country, device) and describe how you would test for treatment-by-segment interaction while controlling false discovery.
-
Interference and long-term effects
-
If ranking changes affect supply/demand dynamics or popularity feedback, propose cluster-randomization or switchback testing and how to interpret results.
-
Rollout
-
Define stop/go criteria and a safe ramp plan.
-
Explain how to update ML training data post-experiment to avoid entangling model training with experimental exposure.