Diagnose rising delivery cost precisely
Company: Intuit
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: HR Screen
A restaurant reports a significant increase in delivery cost per order over the last month. Delivery cost is defined strictly as the cost from courier pickup at the restaurant to drop-off to the customer (courier fees, platform fees, delivery-related refunds/adjustments). It explicitly excludes food ingredients, kitchen labor, packaging materials, and dine-in costs. You’re given historical data: orders(order_id, order_ts, customer_id, store_id, distance_km, promised_eta_min, actual_eta_min, platform, city_zone, courier_id, delivered_bool, cancel_reason), courier_payments(courier_id, order_id, pay_amount, surge_multiplier), fees(order_id, platform_fee, refund_amount, promo_amount), weather_by_zone(date, city_zone, precip_mm, temp_c, wind_kph), and store_ops(store_id, open_hours_json, staffing_level, kitchen_prep_time_avg_min).
Design a rigorous analysis plan to:
1) Validate the metric definition and confirm the increase is not due to scope creep or denominator changes.
2) Decompose delivery cost per order by drivers (mix: platform, distance bands, zones, time-of-day; rate: pay per km/min, surge; execution: cancellations, reattempts, SLA breaches).
3) Identify causal hypotheses (e.g., weather shocks, staffing shifts increasing wait-at-pickup, platform policy changes) and quantify each via appropriate methods (e.g., difference-in-differences across unaffected zones, regression with fixed effects, or event studies).
4) Propose at least two actionable experiments to reduce cost (e.g., batching, zone remapping, pickup-wait time SLA), with success metrics, guardrails, power calculations, and expected bias sources.
5) List common pitfalls specific to this case (e.g., inadvertently including packaging/food costs, counting tips as costs, double-counting refunded orders, survivorship bias from excluding undelivered orders), and how you will avoid them.
Quick Answer: This question evaluates a data scientist's competency in metric validation, driver decomposition, causal inference, and experimental design for operational delivery cost analysis.