Compute power and interpret guardrails

Q: Compute power and interpret guardrails

This question evaluates competency in experimental design and applied statistics for cluster-randomized A/B tests, covering cluster-robust inference, mean and proportion comparisons, power/MDE calculations with ICC and design effects, multiple-testing control, and sensitivity adjustments such as difference-in-differences and CUPED.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty Statistics & Math question, commonly asked during Onsite rounds at DoorDash.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at DoorDash during technical interviews.

Question

Loading...

Context

You ran a 1-week A/B test of a new search ranking with clustered randomization at the DMA level: 100 DMAs total (50 control, 50 treatment). Outcomes are aggregated from per-order/per-session data. Unless stated, assume no Sample Ratio Mismatch (SRM).

Given summaries:

Orders: control = 1,000,000; treatment = 1,050,000
Mean delivery time (minutes): control = 32.4 (SD = 9.1); treatment = 31.9 (SD = 9.5)
Cancellation rate: control = 3.2%; treatment = 3.5%
Baseline conversion: 15% (per session), target MDE for conversion = +0.3 percentage points (pp)
Intra-cluster correlation (ICC) across stores within a DMA for conversion = 0.15

Assumptions to complete missing context:

Cluster-robust inference is at the DMA level. Where DMA-level variance of cluster means is not provided, we approximate it using within-arm SDs and average per-DMA sample sizes, noting this can be optimistic if there is between-DMA heterogeneity.
For cancellation and delivery time, we treat orders as the unit of analysis; for power/MDE, sessions are the relevant unit for conversion.

Tasks

Difference in mean delivery time: compute the treatment–control difference and a 95% CI using a cluster-robust approach at the DMA level. State the estimator and SE formula you use, and report the test statistic and p-value.
SRM check: run a chi-squared test on assignment counts using per-DMA exposure. What threshold flags SRM at α = 0.05? If flagged, how would you diagnose?
Guardrail interpretation: despite faster delivery, cancellations rose by 0.3 pp. Conduct a two-proportion z-test and a cluster-adjusted variant. Quantify practical significance (risk difference and relative risk) and assess the guardrail “no increase > 0.2 pp (95% CI).”
Power/MDE: With 50 DMAs per arm and ICC = 0.15 (for conversion), compute the design effect and the required per-DMA sample to detect a +0.3 pp lift at 80% power, α = 0.05. Show formulas and numeric results.
Multiple metrics: You tracked 5 secondary metrics. Propose a Benjamini–Hochberg FDR = 10% correction and illustrate with hypothetical p-values. When would you instead prefer Holm–Bonferroni?
Sensitivity: A mid-week outage hit 5 treatment DMAs. Explain a pre-registered difference-in-differences using last week as pre-period and weather/outage covariates, avoiding post-treatment bias. Provide the regression with DMA and day fixed effects.
CUPED: Define a high-R² covariate (e.g., prior-week DMA mean delivery time) and write the CUPED-adjusted estimator for the treatment effect.

Compute power and interpret guardrails

Quick Overview

Context

Tasks

Solution

Comments (0)