This question evaluates experimental design and causal inference skills for a data scientist, including choice of randomization unit, contamination and clustering concerns, outcome and guardrail specification, measurement of treatment compliance, and statistical analysis for heterogeneity and variance estimation.

We want to evaluate whether providing couriers with thermal bags reduces customer refunds attributable to cold food. The platform operates across multiple cities with couriers who may deliver in multiple zones and for many stores. Some couriers may not consistently use the bag even if provided.
Assumptions (minimal):
Design a robust experiment that covers:
(a) Randomization unit (courier vs. store vs. zone), with justification considering network effects and contamination (e.g., couriers serving multiple zones; stores serving both arms).
(b) Outcome definitions: primary metric (refund cost per order attributable to cold food) and guardrails (delivery ETA accuracy, contact rate, re-order rate, courier supply hours).
(c) Stratification/clustered randomization across cities and peak vs. off-peak, and how to handle seasonality/holidays (e.g., staggered rollouts and/or difference-in-differences on pre-period trends).
(d) Noncompliance and measurement: when bags are not used, how we measure usage (telemetry/audit photos), and an encouragement design or IV strategy for estimating LATE.
(e) Analysis plan: CUPED/covariate adjustment (distance, cuisine, temperature, store), heterogeneity by cuisine and distance deciles, variance estimation with cluster-robust SEs, sequential monitoring rules, and pre-registered success thresholds.
Login required