Design a robust pro-ranking A/B test
Company: Thumbtack
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
Thumbtack plans to change the pro ranking algorithm in search results for customer requests. Design an experiment to evaluate the new ranker while minimizing marketplace interference and supply cannibalization.
Provide:
1) Primary outcome and guardrails: Define a single primary metric (e.g., booking conversion per request) and at least four guardrails (e.g., time-to-first-quote, cancellation rate, average pro response latency, pro earnings dispersion/fairness) with exact formulas and acceptable threshold deltas.
2) Randomization unit and design: Choose between request-level, customer-level, geography-level cluster, or switchback by region-hour. Justify to reduce cross-unit interference when pros can serve multiple requests. Describe how you’ll prevent pros from systematically over-serving one arm.
3) Power and duration: Given baseline booking conversion = 12%, target relative lift = 5%, alpha = 0.05 (two-sided), power = 90%, and an average of 50,000 eligible requests/day, estimate required sample size per arm and the runtime in days under 1:1 allocation. Show formulas/assumptions (e.g., pooled variance for two-proportion z-test). State how clustering or switchback inflates variance (design effect) and incorporate a plausible ICC to revise the runtime.
4) Bias controls: Specify pre-experiment checks (covariate balance), and variance reduction (e.g., CUPED using pre-period request conversion or stratification by category/region). Explain handling of repeated customers and daylight saving/time-of-day effects.
5) Monitoring and stopping: Propose a sequential monitoring plan (e.g., O’Brien–Fleming or alpha spending) and anomaly triggers. Define what happens if a guardrail breaches but primary improves.
6) Readout: Detail the difference-in-means estimator, heterogeneity by category/region/traffic source, and how you would attribute uplift vs. cannibalization across supply-limited segments.
Quick Answer: It evaluates experimental design and causal inference competencies for two-sided marketplaces, focusing on metric definition and guardrails, randomization to limit interference, power and sample-size estimation, bias controls, monitoring and sequential stopping, and attribution of uplift versus cannibalization.