PM Technical Fundamentals for Growth Experimentation
Asked of: Product Manager
Last updated
What's being tested
Interviewers are probing whether a Product Manager can use experimentation to make disciplined growth decisions without overstepping into data science implementation. For DoorDash, this matters because growth levers like referrals, promotions, onboarding, DashPass, search ranking, and reactivation can move one side of the marketplace while harming another. A strong answer shows you can define the customer problem, choose the right metric stack, reason about causality, and make a launch/no-launch call under uncertainty. The interviewer is not looking for deep statistical derivations; they are looking for whether you can partner intelligently with DS, Engineering, Marketing, and Ops.
Core knowledge
-
A/B testing is the default tool when you can randomly expose comparable users to treatment and control. A PM should specify the hypothesis, eligibility, user experience, primary metric, guardrails, duration, and decision rule before launch.
-
Growth experiments should tie to a clear funnel step: acquisition, activation, conversion, retention, referral, or resurrection. For
DoorDash, examples includenew_user_order_rate,first_to_second_order_conversion,DashPass_trial_start_rate,promo_redemption_rate, and30_day_retention. -
Primary metrics should reflect the intended product outcome, not just the easiest measurable click. If testing a new referral incentive,
referral_invites_sentis a weak primary metric;referred_users_first_order_completedor incrementalnew_customer_ordersis closer to business value. -
Guardrail metrics protect against marketplace damage.
DoorDashPMs should monitor metrics likedelivery_lateness,cancellation_rate,Dasher_acceptance_rate,merchant_prep_delay,support_contact_rate, contribution margin, and promo abuse rate alongside consumer growth metrics. -
Randomization unit matters. User-level randomization works for many consumer-facing flows, but geo-level or market-level randomization may be needed when there is interference, such as promotions affecting
Dashersupply, merchant load, delivery times, or local marketplace liquidity. -
Causal inference starts with asking, “What would have happened without this change?” Random assignment helps isolate the treatment effect, while pre/post comparisons are vulnerable to seasonality, holidays, weather, competitor promos, and payday effects.
-
Incrementality is critical for promotions. A 20% lift in promo redemptions is not meaningful if users would have ordered anyway. PMs should ask for incremental orders, incremental gross profit, and cannibalization of full-price orders, not just redemption volume.
-
Statistical significance is not the same as business significance. A tiny lift can be statistically significant at large sample sizes but economically unattractive. A useful framing is: then compare lift to cost, operational risk, and strategic value.
-
Power and duration are planning constraints, not afterthoughts. A PM does not need to compute every formula live, but should know that low-baseline metrics need larger samples, small expected effects require longer tests, and stopping early after a “good-looking” day increases false positives.
-
Segmentation helps explain heterogeneous impact.
DoorDashgrowth tests often differ by new vs. existing users, urban vs. suburban markets, high-frequency vs. low-frequency consumers,DashPassvs. non-DashPass, cuisine type, and supply-constrained vs. demand-constrained geographies. -
Experiment contamination can invalidate results. Examples include users sharing promo codes across groups, households seeing different treatments, merchants changing behavior after noticing demand shifts, or paid marketing campaigns overlapping with the test.
-
Launch decisions should combine metric evidence with product judgment. A PM should be ready to recommend launch, iterate, rollback, expand to more markets, or run a follow-up test based on impact size, guardrails, confidence, customer feedback, and operational readiness.
Worked example
For “How would you evaluate a new referral incentive for DoorDash?”, a strong candidate would first frame the goal: “Are we trying to acquire net-new consumers, reactivate lapsed users, increase order frequency, or lower acquisition cost versus paid channels?” They would clarify the incentive mechanics, eligibility, markets, fraud risk, and whether both referrer and referee receive value.
The answer skeleton should have four pillars: hypothesis, experiment design, metric stack, and decision rule. The hypothesis might be: increasing the referee discount from $10 to $15 will increase incremental first orders enough to offset higher promo cost. The experiment could randomize eligible existing consumers into current referral offer vs. higher referral offer, while ensuring referred users are attributable and excluding employees, suspicious accounts, or users already targeted by another acquisition campaign.
The primary metric should be incremental referred users completing a first order, not just invites sent. Guardrails should include contribution margin, promo abuse, duplicate accounts, referral conversion quality, 30_day_retention, cancellation rate, and support contacts. One tradeoff to flag explicitly: a richer incentive may grow top-line orders faster but attract lower-intent users who churn after the discount, hurting long-term payback.
A strong close would be: “If the test improves incremental first orders and early retention while staying within CAC and margin thresholds, I would ramp by market and monitor fraud. If lift is concentrated only among low-retention users, I would test targeting the offer to high-LTV referrers or reducing the reward amount.”
A second angle
For “How would you test a new onboarding flow for first-time DoorDash users?”, the same experimentation fundamentals apply, but the constraint shifts from incentive economics to activation friction. The randomization unit can likely be user-level because the experience is contained within the app and less likely to affect local supply. The primary metric might be first_order_completion_rate or time_to_first_order, with guardrails like checkout abandonment, support contacts, refund rate, and delivery quality. Unlike a referral test, the PM should pay more attention to funnel diagnostics: sign-up completion, address entry, store browsing, cart creation, checkout start, and payment success. The launch decision should consider whether the flow improves conversion without hiding important information like fees, delivery ETAs, or substitution policies.
Common pitfalls
Pitfall: Choosing vanity metrics as the primary success measure.
A tempting but weak answer is, “We’ll measure clicks, impressions, and promo claims.” Those can diagnose behavior, but they do not prove growth. A better PM answer anchors on incremental completed orders, retained users, gross profit, or marketplace health, then uses clicks and claims as secondary funnel metrics.
Pitfall: Treating every experiment like a simple user-level A/B test.
DoorDash is a three-sided marketplace, so consumer changes can affect Dashers and merchants. If a promotion spikes demand in one city, treatment users may worsen ETAs for control users, violating independence. A stronger answer calls out interference and considers market-level testing, phased rollouts, or analyzing supply-constrained markets separately.
Pitfall: Communicating like a statistician instead of a decision-maker.
Some candidates over-index on p-values, confidence intervals, or sample-size math without making a recommendation. PMs should show statistical literacy, but the final answer needs a business decision: launch, do not launch, iterate, expand, or investigate. The best responses say what evidence would change their mind.
Connections
Interviewers may pivot from growth experimentation into metric design, marketplace dynamics, promotion economics, pricing, customer segmentation, or launch strategy. Be prepared to connect experiment results to DoorDash-specific tradeoffs: consumer demand, Dasher supply, merchant operations, delivery quality, and contribution margin.
Further reading
-
Trustworthy Online Controlled Experiments by Kohavi, Tang, and Xu — the standard practitioner reference for online experimentation and decision-making.
-
Airbnb Engineering: Experimentation at Airbnb — useful examples of experimentation culture, metric interpretation, and marketplace testing.
-
Booking.com experimentation papers and talks by Ronny Kohavi and collaborators — practical lessons on false positives, guardrails, and scaling controlled experiments.
Related concepts
- Experimentation, Diagnostics, and Growth Infrastructure for Non-Technical PMs
- A/B Testing and Growth Infrastructure for Non-Technical PMs
- Diagnostics, A/B Testing, Estimation, and Growth Infrastructure Fundamentals
- Technical Fundamentals for Non-Technical Product Managers
- Technical Fundamentals for Non-Technical Product Managers
- Growth Diagnostics, Metric Trees, Estimation, and A/B Testing