Marketplace Metric Frameworks
Asked of: Data Scientist
Last updated

What's being tested
DoorDash is probing whether you can build a marketplace measurement framework for a three-sided system: consumers, Dashers, and merchants. Strong answers connect business goals like completed_orders, gross_order_value, and contribution_profit to operational realities like Dasher supply, delivery reliability, conversion funnels, and merchant availability. The interviewer is not looking for a generic metric list; they are testing whether you can define the right primary metric, choose meaningful guardrails, segment intelligently, and design an experiment or quasi-experiment that supports causal claims. DoorDash cares because marketplace changes often help one side while hurting another, so a Data Scientist must quantify tradeoffs rather than optimize a single surface metric.
Core knowledge
-
North Star metrics should reflect marketplace value creation, not just activity. For DoorDash, likely candidates include
completed_orders,order_volume,gross_order_value,delivery_success_rate, orcontribution_profit, depending on the problem. Always clarify whether the goal is growth, reliability, profitability, or liquidity. -
Three-sided marketplace decomposition is essential. A change in
completed_orderscan be decomposed into consumer demand, merchant supply, and Dasher capacity:
Fulfillment success itself depends on acceptance, pickup, delivery, cancellation, and lateness. -
Funnel metrics should separate intent from execution. For app install or ordering flows, track
landing_page_views,click_to_app_store,install_start,install_complete,first_open,signup_complete,first_order_started, andfirst_order_completed. A DS should define denominators precisely because a 5% lift ininstall_ratemay not translate tofirst_order_rate. -
Guardrail metrics protect marketplace health. Typical guardrails include
cancellation_rate,lateness_rate,refund_rate,support_contact_rate,Dasher_utilization,merchant_order_error_rate,average_delivery_time, andconsumer_NPS. A launch that increases demand but worsensdelivery_success_ratemay be a bad tradeoff. -
Segmentation is not optional in local marketplaces. Analyze by
market,submarket, cuisine, time of day, weekday/weekend, new versus returning consumers, Dasher vehicle type, merchant density, and weather or event periods. DoorDash effects are often heterogeneous: Los Angeles lunch may behave differently from suburban dinner. -
Metric hierarchy helps communicate clearly. Use a tree: business outcome → driver metrics → diagnostic metrics. For example,
completed_orders→checkout_conversion,available_merchants,average_delivery_fee,ETA,Dasher_supply_hours→assignment_rate,acceptance_rate,pickup_wait_time,merchant_prep_time. -
Experiment unit choice determines validity. Consumer-facing UI changes may randomize by user or session; Dasher incentives may require market-level or Dasher-level randomization; new-market rollouts often need geo-level designs. Watch for interference, where treatment changes supply-demand balance for untreated users in the same market.
-
Causal inference methods matter when randomized tests are impractical. Use difference-in-differences when treated and control markets have parallel pre-trends, synthetic control for one or few treated geographies, and interrupted time series for broad launches. Always inspect pre-period trends and seasonality before claiming causality.
-
Power and minimum detectable effect should be discussed at the metric level. A binary metric like conversion has approximate standard error ; rare events like fraud or severe lateness need much larger samples. For geo experiments, effective sample size is the number of markets, not the number of users.
-
Ratio metrics require care. Metrics like
delivery_success_rate = successful_deliveries / accepted_orderscan move because the numerator improves or because the denominator composition changes. Use numerator-denominator decomposition and consider delta-method, bootstrap, or cluster-robust standard errors. -
Marketplace tradeoffs should be made explicit. Lower delivery fees may increase
conversion_ratebut reducecontribution_margin; higher Dasher incentives may improveassignment_ratebut hurt profitability. A strong DS frames success as a constrained optimization: maximize primary metric subject to guardrails staying within pre-defined thresholds. -
Anomaly diagnosis should start broad, then narrow. Validate whether the drop is real, identify affected segments, decompose the metric, generate hypotheses, and test them with historical comparisons, cohorts, and counterfactuals. Do not jump to one cause like “fewer Dashers” without checking demand, merchant availability, app changes, pricing, and external shocks.
Worked example
For “Diagnose completed orders drop in Los Angeles”, a strong candidate would start by clarifying: “Is this a sudden or sustained drop, relative to what baseline, and is it isolated to LA or visible in comparable markets?” They would define the target metric precisely, such as completed_orders by order date, and ask whether the drop is absolute volume, seasonally adjusted volume, or conversion-adjusted volume. The answer should be organized around four pillars: first, validate the data and scope; second, decompose completed_orders into demand, conversion, and fulfillment; third, segment by geography, time, customer cohort, merchant type, and Dasher supply; fourth, test hypotheses against controls or historical baselines.
A good decomposition might compare traffic, cart_start_rate, checkout_conversion, payment_success_rate, assignment_rate, cancellation_rate, and delivery_success_rate. If LA traffic is stable but checkout conversion is down, investigate fees, ETA, app changes, or restaurant availability; if conversion is stable but fulfillment is down, inspect Dasher supply, acceptance, weather, and merchant prep time. A key tradeoff to flag is speed versus rigor: for an urgent operational issue, you would first produce a directional decomposition within hours, then follow with causal validation using matched markets or pre/post analysis. The candidate should avoid claiming causality from correlation and instead say what evidence would distinguish competing hypotheses. A strong close: “If I had more time, I’d build a market-level counterfactual using similar cities and quantify how much each driver explains of the LA shortfall.”
A second angle
For “Boost App Installs: Analyze and Experiment with Conversion Funnel”, the same framework applies, but the object is a consumer acquisition funnel rather than an operating-marketplace anomaly. The primary metric might be first_order_completed_per_landing_page_visitor, not simply app_install_rate, because installs only matter if they produce marketplace demand. The experiment could randomize mobile web visitors into different app-install prompts, deep links, incentives, or landing-page designs, with guardrails like bounce_rate, web_order_conversion, and cost_per_incremental_first_order. The main constraint is attribution: users may switch devices, install later, or already have the app, so the DS must define eligible cohorts carefully. Unlike LA diagnostics, this is more amenable to user-level A/B testing, though interference can still occur if incremental orders affect Dasher capacity in constrained markets.
Common pitfalls
Pitfall: Optimizing one metric while ignoring marketplace balance.
A tempting answer is “success is more orders” or “success is more installs.” That is incomplete at DoorDash because demand growth can worsen ETA, lateness_rate, or cancellation_rate if Dasher supply and merchant capacity do not scale. A better answer names a primary metric and 3–5 guardrails with explicit thresholds or decision rules.
Pitfall: Treating every analysis as a simple A/B test.
Many marketplace programs, especially new-market launches or Dasher supply changes, cannot be cleanly randomized at the user level. If treatment changes local liquidity, untreated users may be affected too, violating SUTVA. Strong candidates discuss geo experiments, difference-in-differences, synthetic controls, or staged rollouts when randomization is constrained.
Pitfall: Listing metrics without a causal story.
Interviewers are not impressed by a long dashboard inventory. They want to see how metrics connect: if completed_orders drops, what driver changed, what hypotheses explain that driver, and what evidence would confirm or reject each hypothesis? Use a metric tree and narrate the investigation path.
Connections
Interviewers may pivot from marketplace metrics into experiment design, especially unit of randomization, power, and guardrail interpretation. They may also move into causal inference, funnel analytics, cohort analysis, or ranking/recommendation evaluation if the marketplace change involves search results, ETA ranking, promotions, or personalized incentives.
Further reading
-
Trustworthy Online Controlled Experiments by Kohavi, Tang, and Xu — Practical reference for A/B testing, guardrails, ratio metrics, and experimentation pitfalls.
-
Mostly Harmless Econometrics by Angrist and Pischke — Strong foundation for difference-in-differences, instrumental variables, and causal reasoning in observational settings.
-
Causal Inference: The Mixtape by Scott Cunningham — Accessible treatment of modern causal methods, including synthetic control and event studies useful for geo-market analysis.
Featured in interview prep guides
Practice questions
- Diagnose completed orders drop in Los AngelesDoorDash · Data Scientist · Onsite · medium
- Investigate LA Order DropDoorDash · Data Scientist · Technical Screen · medium
- Investigate Falling Successful OrdersDoorDash · Data Scientist · Technical Screen · hard
- Diagnose Decline in Successful OrdersDoorDash · Data Scientist · Technical Screen · medium
- Design analytics for a new-market launchDoorDash · Data Scientist · Onsite · hard
- Diagnose LA completed-order drop and design experimentDoorDash · Data Scientist · Technical Screen · hard
- Identify Key Drivers of Delivery Decline in Los AngelesDoorDash · Data Scientist · Technical Screen · hard
- Determine Success Metrics for Biker Dasher Program LaunchDoorDash · Data Scientist · Technical Screen · medium
- Diagnose Decline in Delivery Success: Data, Hypotheses, TestsDoorDash · Data Scientist · Technical Screen · medium
- Evaluate Key Metrics for Biker-Dasher Program SuccessDoorDash · Data Scientist · Technical Screen · medium
- Boost App Installs: Analyze and Experiment with Conversion FunnelDoorDash · Data Scientist · Onsite · medium
Related concepts
- Product Metric Frameworks And Diagnostic AnalyticsAnalytics & Experimentation
- Product Metrics And Marketplace DiagnosticsAnalytics & Experimentation
- Product Metric Design And Diagnostic Deep DivesAnalytics & Experimentation
- Product Metrics, Funnels, And SegmentationAnalytics & Experimentation
- Product Metrics, Guardrails, And RetentionAnalytics & Experimentation
- DoorDash Three-Sided Marketplace Segmentation and Diagnostics