You ran the experiment for 14 days (2025-08-15 to 2025-08-28) with 1:1 allocation, N_control = 500,000 users, N_treatment = 500,000 users. Summarized results: - Sessions/user: control 3.20, treatment 3.28; relative lift +2.5%; SE(lift) 1.2%; p=0.032; desired direction up; not a guardrail. - 7-day retention rate: control 28.0%, treatment 28.6%; absolute diff +0.6 pp; SE 0.35 pp; p=0.078; desired up. - Video CTR: control 4.0%, treatment 4.6%; relative lift +15.0%; SE 4.5%; p=0.004; desired up. - Hide rate: control 1.80%, treatment 2.05%; relative lift +13.9% (worse); SE 5.0%; p=0.011; guardrail yes. - Time per session: control 5.80 min, treatment 5.95 min; relative lift +2.6%; SE 1.5%; p=0.092; desired up. Answer: 1) For each metric, construct a two-sided 95% confidence interval using the provided effect size and SE, and interpret whether it excludes no effect. 2) Apply the Benjamini–Hochberg procedure at FDR 5% across the five p-values. Which metrics remain significant? Show your steps. 3) Discuss statistical vs. practical significance for Video CTR and Sessions/user; include a back-of-the-envelope estimate of incremental engaged sessions per day if rolled to 100% of US new users (state any reasonable assumption you need). 4) Hide rate is a guardrail and increased significantly. Quantify the expected absolute change (in pp) and discuss Type I/II risks, Type S/M errors, and whether this should block rollout despite other gains. 5) Power check: Assuming baseline 7-day retention = 28% and target MDE = +0.5 pp absolute at α=0.05 (two-sided) and 80% power, estimate the required per-variant sample size using a normal approximation. Is the current experiment sufficiently powered for that MDE? 6) Provide a concise go/no-go recommendation with rationale and any follow-up analyses you would run (e.g., heterogeneity by new vs. existing users, device, or pin_format).

This question evaluates proficiency in statistical inference for A/B testing, covering confidence intervals, p-values, multiple-testing correction (Benjamini–Hochberg), effect-size interpretation, power/sample-size calculation, and guardrail risk assessment.

How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a medium difficulty Statistics & Math question, commonly asked during HR Screen rounds at Upstart.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Upstart during technical interviews.

Interpret A/B results with p-values and uncertainty

A/B Test: Effect Sizes, CIs, Multiple Testing, Power, and Decision

Context: You ran a 14‑day experiment (2025‑08‑15 → 2025‑08‑28) with 1:1 allocation and equal sample sizes (N_control = 500,000 users; N_treatment = 500,000 users). Summary metrics are below. Effect sizes are reported as relative lifts for rate-like/ratio metrics and as absolute differences in percentage points (pp) for 7‑day retention.

Metrics (control → treatment):

Sessions/user: 3.20 → 3.28; relative lift +2.5%; SE(lift) 1.2%; p = 0.032; desired direction: up; not a guardrail.
7‑day retention rate: 28.0% → 28.6%; absolute diff +0.6 pp; SE 0.35 pp; p = 0.078; desired up.
Video CTR: 4.0% → 4.6%; relative lift +15.0%; SE 4.5%; p = 0.004; desired up.
Hide rate: 1.80% → 2.05%; relative lift +13.9% (worse); SE 5.0%; p = 0.011; guardrail = yes.
Time per session: 5.80 → 5.95 minutes; relative lift +2.6%; SE 1.5%; p = 0.092; desired up.

Tasks:

For each metric, construct a two‑sided 95% confidence interval using the provided effect size and SE, and interpret whether it excludes no effect.
Apply the Benjamini–Hochberg procedure at FDR 5% across the five p‑values. Which metrics remain significant? Show your steps.
Discuss statistical vs. practical significance for Video CTR and Sessions/user; include a back‑of‑the‑envelope estimate of incremental engaged sessions per day if rolled to 100% of US new users (state any reasonable assumption you need).
Hide rate is a guardrail and increased significantly. Quantify the expected absolute change (in pp) and discuss Type I/II risks, Type S/M errors, and whether this should block rollout despite other gains.
Power check: Assuming baseline 7‑day retention = 28% and target MDE = +0.5 pp absolute at α = 0.05 (two‑sided) and 80% power, estimate the required per‑variant sample size using a normal approximation. Is the current experiment sufficiently powered for that MDE?
Provide a concise go/no‑go recommendation with rationale and any follow‑up analyses you would run (e.g., heterogeneity by new vs. existing users, device, or pin_format).

A/B Test: Effect Sizes, CIs, Multiple Testing, Power, and Decision

Metrics (control → treatment):

Sessions/user: 3.20 → 3.28; relative lift +2.5%; SE(lift) 1.2%; p = 0.032; desired direction: up; not a guardrail.
7‑day retention rate: 28.0% → 28.6%; absolute diff +0.6 pp; SE 0.35 pp; p = 0.078; desired up.
Video CTR: 4.0% → 4.6%; relative lift +15.0%; SE 4.5%; p = 0.004; desired up.
Hide rate: 1.80% → 2.05%; relative lift +13.9% (worse); SE 5.0%; p = 0.011; guardrail = yes.
Time per session: 5.80 → 5.95 minutes; relative lift +2.6%; SE 1.5%; p = 0.092; desired up.

Tasks:

For each metric, construct a two‑sided 95% confidence interval using the provided effect size and SE, and interpret whether it excludes no effect.
Apply the Benjamini–Hochberg procedure at FDR 5% across the five p‑values. Which metrics remain significant? Show your steps.
Discuss statistical vs. practical significance for Video CTR and Sessions/user; include a back‑of‑the‑envelope estimate of incremental engaged sessions per day if rolled to 100% of US new users (state any reasonable assumption you need).
Hide rate is a guardrail and increased significantly. Quantify the expected absolute change (in pp) and discuss Type I/II risks, Type S/M errors, and whether this should block rollout despite other gains.
Power check: Assuming baseline 7‑day retention = 28% and target MDE = +0.5 pp absolute at α = 0.05 (two‑sided) and 80% power, estimate the required per‑variant sample size using a normal approximation. Is the current experiment sufficiently powered for that MDE?
Provide a concise go/no‑go recommendation with rationale and any follow‑up analyses you would run (e.g., heterogeneity by new vs. existing users, device, or pin_format).

Interpret A/B results with p-values and uncertainty

Quick Overview

Interpret A/B results with p-values and uncertainty

A/B Test: Effect Sizes, CIs, Multiple Testing, Power, and Decision

Write your answer

Interpret A/B results with p-values and uncertainty

Quick Overview

Interpret A/B results with p-values and uncertainty

A/B Test: Effect Sizes, CIs, Multiple Testing, Power, and Decision

Write your answer