Experiment Design: Estimating Causal Impact of a New Rider ETA Model in a Two-Sided Marketplace
Context
You are testing a new rider ETA model that changes the ETA shown in the app. Because drivers roam across zones, interference between units is likely (violating SUTVA if not handled). Design an experiment to estimate the causal effect on rider experience (demand) and overall marketplace health (supply/quality), accounting for interference and partial compliance.
Tasks
-
Metrics
-
Define up to 3 primary success metrics (demand) and up to 5 guardrails (supply/quality).
-
Provide exact formulas and units; for example: request→trip conversion = completed_requests / total_requests; driver earnings/hour; post-dispatch cancellation within 5 minutes; pickup ETA p50.
-
Randomization Unit
-
Choose the randomization unit among rider, trip, driver, or zone (geohash-6) × hour.
-
Justify the choice with respect to SUTVA/interference.
-
Specify blocking/stratification: city tier × weekday/weekend.
-
If Unit Randomization Is Infeasible
-
Propose a stepped-wedge rollout with a 10% geo holdout.
-
Describe contamination controls (e.g., driver shadow pools, sticky assignment, exclusion buffers).
-
Explain how you will measure and cap cross-arm exposure.
-
Instrumentation and Estimation
-
Define an exposure variable that confirms the rider UI actually showed the new ETA.
-
Specify an as-treated estimator robust to partial compliance and how you will bound bias from post-randomization selection.
-
Estimation Design Comparison
-
Compare classic A/B vs synthetic control vs difference-in-differences: when each is valid here.
-
Describe pre-trend diagnostics you will run.
-
Explain how to build city-level synthetic controls (donor pool, regularization, covariates).
-
Powering
-
Compute the required sample size to detect a 1.5% relative drop in cancellations with baseline 12%, power 80%, alpha 0.05, ICC = 0.02, average 1,500 trips/day/zone over 28 days.
-
Show the clustered variance formula, the design effect, and the minimum number of zones per arm.
-
Biases and Mitigations
-
List at least four biases (e.g., demand shocks, surge, time-of-day heterogeneity, learning/fatigue) and a mitigation for each (blocking, CUPED with pre-period cancels, calendar alignment, heterogeneity pre-spec).
-
Monitoring and Rollout
-
Propose sequential harm boundaries (e.g., alpha-spending).
-
Specify a rollout rule using a cost-benefit model that includes driver earnings impacts and SLA guardrails.
-
Deliverables
-
Provide a pre-registration outline (estimands, metrics, exclusions, missing-data rules, interim looks).
-
Provide a plan to report heterogeneity by city tier and weather without p-hacking.