A delivery marketplace is piloting a new courier mode called "Biker" in dense urban zones to improve efficiency. Assume data access to orders, courier shifts, geospatial zones, payouts, and customer feedback. Design an evaluation plan focused on the goal of efficiency, and answer all sub-questions precisely.
Assumptions you must use:
-
Treat "today" as 2025-09-01.
-
Efficiency should reflect both speed and capacity utilization while preserving marketplace health.
Tasks:
-
Define success: Propose 1 primary metric, 1–2 secondary metrics, and 3+ guardrails. For each, give the exact formula, event timestamp boundaries, unit of analysis, and recommended aggregation (mean vs p95, etc.). Examples you may consider: p95 delivery time from courier-accept to dropoff; orders-per-courier-hour; cost per delivered order; ETA accuracy; cancellation rate; customer/courier CSAT; supply utilization.
-
Long-term signal: Specify one long-term metric that captures durable efficiency (e.g., 30-day repeat purchase rate or courier retention). Describe its measurement window, required sample size inflation vs. short-term, and why it’s less gameable than short-term metrics.
-
Experiment design: Choose the experimental unit (order-level vs courier-level vs zone-time cluster). Justify your choice considering interference/network effects, shared supply, and spillovers. Define stratification variables (city, zone density, time-of-day, weather) and a ramp plan that respects courier capacity limits.
-
Power and duration: Outline how you would estimate MDE and runtime using realistic baselines for your primary metric. State the minimum runtime you’d enforce to cover weekly seasonality.
-
Data sourcing: List the concrete data you would query (tables/fields you expect to exist) to compute each metric, including how to obtain zone-level supply (courier online minutes), routing/assignment timestamps, and incentive/payout amounts.
-
Analysis plan: Pre-register hypotheses and decision thresholds (e.g., ship if primary improves by ≥X% with all guardrails within ±Y). Specify how you will handle missing data, late deliveries that straddle day boundaries, and users/couriers who cross zones.
-
If an A/B test is infeasible, propose a difference-in-differences or synthetic control design at the zone level. Define treated vs control, matching variables, and placebo/parallel-trend checks.
Provide a concise, defensible plan with concrete formulas and design choices.