Using the following A/B test snapshot for the pickup ETA card experiment, answer all parts. Data (7-day snapshot): - Primary metric (trip completion rate per request): • Control A: nA = 50,000 requests, cA = 6,000 completions • Treatment B: nB = 50,000 requests, cB = 6,420 completions - Guardrail 1 (rider cancel rate per request): • Control A: cancelsA = 4,500 • Treatment B: cancelsB = 4,950 - Guardrail 2 (wait time minutes, per request): • A: meanA = 4.8, sdA = 3.2, nA = 50,000 • B: meanB = 4.7, sdB = 3.4, nB = 50,000 - There were 5 interim looks at equally spaced information times with no pre-registered alpha spending. Tasks: 1) State precise H0 and H1 for the primary metric; specify one- vs two-sided and justify. 2) Choose the appropriate test for the primary metric (difference in proportions) and compute: test statistic, p-value, and a 95% CI for the lift. Show formulas and numeric results. 3) For Guardrail 2 (mean wait time), select the correct test (e.g., Welch’s t-test) and compute the 95% CI of the mean difference. State any distributional assumptions and why Welch vs pooled. 4) Perform a multiple-testing correction across the three outcomes (Primary, Guardrail 1, Guardrail 2) using Holm–Bonferroni at familywise α = 0.05. Identify which effects remain significant. 5) Explain, in plain language, what the p-value you computed in (2) does and does not mean. 6) Given the unplanned 5 interim looks, re-evaluate significance using a simple Pocock or O’Brien–Fleming alpha-spending approach (outline the approach and provide an approximate adjusted conclusion; exact boundaries not required but justify your decision). 7) If pre-period completion rate per rider has correlation r = 0.40 with the in-experiment outcome, estimate the approximate variance reduction from CUPED and discuss how that would change required sample size or interpretation. 8) Conclude: ship, iterate, or stop? Defend your decision considering the guardrails.

This question evaluates a data scientist's competency in experimental design and statistical inference for A/B testing, covering hypothesis formulation, difference-in-proportions testing and confidence intervals, guardrail analysis, multiple-testing correction, interim alpha-spending approaches, and variance-reduction techniques such as CUPED.

How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a hard difficulty Statistics & Math question, commonly asked during Technical Screen rounds at Uber.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Uber during technical interviews.

Formulate hypotheses and compute AB test significance

A/B Test Snapshot: Pickup ETA Card Experiment

You are analyzing a 7-day A/B test with equal allocation. Each request is an exposure; the primary outcome is completion per request. Two guardrails monitor safety/experience. Assume independent observations and large-sample approximations are acceptable.

Data (7-day snapshot):

Primary metric (trip completion rate per request):
- Control A: nA = 50,000 requests, cA = 6,000 completions
- Treatment B: nB = 50,000 requests, cB = 6,420 completions
Guardrail 1 (rider cancel rate per request):
- Control A: cancelsA = 4,500
- Treatment B: cancelsB = 4,950
Guardrail 2 (wait time, minutes per request):
- A: meanA = 4.8, sdA = 3.2, nA = 50,000
- B: meanB = 4.7, sdB = 3.4, nB = 50,000
There were 5 interim looks at equally spaced information times with no pre-registered alpha spending.

Tasks:

State precise H0 and H1 for the primary metric; specify one- vs. two-sided and justify.
Choose the appropriate test for the primary metric (difference in proportions) and compute: test statistic, p-value, and a 95% CI for the lift. Show formulas and numeric results.
For Guardrail 2 (mean wait time), select the correct test (e.g., Welch’s t-test) and compute the 95% CI of the mean difference. State any distributional assumptions and why Welch vs. pooled.
Perform a multiple-testing correction across the three outcomes (Primary, Guardrail 1, Guardrail 2) using Holm–Bonferroni at familywise α = 0.05. Identify which effects remain significant.
Explain, in plain language, what the p-value you computed in (2) does and does not mean.
Given the unplanned 5 interim looks, re-evaluate significance using a simple Pocock or O’Brien–Fleming alpha-spending approach (outline the approach and provide an approximate adjusted conclusion; exact boundaries not required but justify your decision).
If pre-period completion rate per rider has correlation r = 0.40 with the in-experiment outcome, estimate the approximate variance reduction from CUPED and discuss how that would change required sample size or interpretation.
Conclude: ship, iterate, or stop? Defend your decision considering the guardrails.

A/B Test Snapshot: Pickup ETA Card Experiment

Data (7-day snapshot):

Primary metric (trip completion rate per request):
- Control A: nA = 50,000 requests, cA = 6,000 completions
- Treatment B: nB = 50,000 requests, cB = 6,420 completions
Guardrail 1 (rider cancel rate per request):
- Control A: cancelsA = 4,500
- Treatment B: cancelsB = 4,950
Guardrail 2 (wait time, minutes per request):
- A: meanA = 4.8, sdA = 3.2, nA = 50,000
- B: meanB = 4.7, sdB = 3.4, nB = 50,000
There were 5 interim looks at equally spaced information times with no pre-registered alpha spending.

Tasks:

State precise H0 and H1 for the primary metric; specify one- vs. two-sided and justify.
Choose the appropriate test for the primary metric (difference in proportions) and compute: test statistic, p-value, and a 95% CI for the lift. Show formulas and numeric results.
For Guardrail 2 (mean wait time), select the correct test (e.g., Welch’s t-test) and compute the 95% CI of the mean difference. State any distributional assumptions and why Welch vs. pooled.
Perform a multiple-testing correction across the three outcomes (Primary, Guardrail 1, Guardrail 2) using Holm–Bonferroni at familywise α = 0.05. Identify which effects remain significant.
Explain, in plain language, what the p-value you computed in (2) does and does not mean.
Given the unplanned 5 interim looks, re-evaluate significance using a simple Pocock or O’Brien–Fleming alpha-spending approach (outline the approach and provide an approximate adjusted conclusion; exact boundaries not required but justify your decision).
If pre-period completion rate per rider has correlation r = 0.40 with the in-experiment outcome, estimate the approximate variance reduction from CUPED and discuss how that would change required sample size or interpretation.
Conclude: ship, iterate, or stop? Defend your decision considering the guardrails.

Formulate hypotheses and compute AB test significance

Quick Overview

A/B Test Snapshot: Pickup ETA Card Experiment

Solution

Comments (0)

Formulate hypotheses and compute AB test significance

Quick Overview

A/B Test Snapshot: Pickup ETA Card Experiment

Solution

Comments (0)