Experiment Design: Evaluating a New Pro Ranking Algorithm (Ranker) in a Two‑Sided Marketplace
You are designing an experiment to evaluate a new pro ranking algorithm in search results for customer requests, while minimizing marketplace interference and supply cannibalization.
Provide the following:
-
Primary outcome and guardrails
-
Define one primary metric (e.g., booking conversion per request), with a clear measurement window.
-
Specify at least four guardrail metrics (e.g., time-to-first-quote, cancellation rate, average pro response latency, pro earnings dispersion/fairness).
-
For each metric, provide an exact formula and acceptable threshold delta (absolute or relative).
-
Randomization unit and design
-
Choose a randomization unit: request-level, customer-level, geography-level cluster, or switchback by region-hour.
-
Justify your choice to reduce cross-unit interference, given that pros can serve multiple requests.
-
Describe controls to prevent pros from systematically over-serving one arm.
-
Power and duration
-
Given: baseline booking conversion = 12%, target relative lift = 5%, alpha = 0.05 (two-sided), power = 90%, and 50,000 eligible requests/day.
-
Estimate required sample size per arm and runtime in days under 1:1 allocation.
-
Show formulas/assumptions (e.g., pooled variance for a two-proportion z-test).
-
Explain how clustering or switchback inflates variance (design effect), state a plausible ICC, and revise runtime accordingly.
-
Bias controls
-
Specify pre-experiment checks (e.g., covariate balance with standardized differences).
-
Propose variance reduction strategies (e.g., CUPED using pre-period request conversion, stratification by category/region).
-
Explain how you will handle repeated customers and daylight saving/time-of-day effects.
-
Monitoring and stopping
-
Propose a sequential monitoring plan (e.g., O’Brien–Fleming or alpha spending) and anomaly triggers.
-
Define the decision rule if the primary metric improves but a guardrail breaches.
-
Readout
-
Define the difference-in-means estimator and standard error approach.
-
Describe heterogeneity analyses by category/region/traffic source.
-
Explain how you will attribute uplift vs. cannibalization across supply-limited segments.