Answer all parts concisely and with calculations where requested. (a) Define statistical power for a two-proportion A/B test and list the primary levers that increase power, ranking them by typical practical impact (largest to smallest) and briefly explaining trade-offs. Include: effect size (MDE), variance/metric volatility, sample size, allocation ratio, alpha, variance-reduction (e.g., CUPED), bucketing/stratification, and test duration/seasonality. (b) Scenario: Baseline conversion p0 = 5.0%. Target relative lift = +7% (p1 = 5.35%). Two-sided alpha = 0.05, desired power = 0.80, equal allocation, independent users, no clustering. Compute the required sample size per variant and the minimum test duration (days) if you receive 80,000 eligible users/day with an expected 10% post-randomization attrition. Show formulas and numeric results. (c) Recompute part (b) assuming CUPED with R^2 = 0.30 (i.e., a 30% relative variance reduction). What is the new sample size per variant and duration? (d) How does switching to a 90/10 allocation (90% control, 10% treatment) affect power at fixed total traffic? Provide intuition and, if possible, a quantitative comparison to equal split. (e) Your test, run for the duration from (b), returns a statistically significant −2% lift (treatment worse) contrary to your prior expectation of +7%. Outline a step-by-step diagnostic plan before drawing conclusions: include SRM checks (and why), instrumentation/metric definition audits, bot/geo/device imbalance, novelty/learning effects, outlier clipping, Simpson’s paradox via key segments, guardrail metrics, and peeking/stopping risk. (f) After diagnostics, propose an evidence-based decision tree: when to (i) ship, (ii) iterate with a follow-up test (specify one design change), or (iii) rerun (state the precise condition that justifies a rerun).

This question evaluates a data scientist's mastery of A/B testing fundamentals — statistical power and sample-size calculations, effect-size and variance considerations, allocation strategies, variance-reduction methods such as CUPED, and experiment diagnostics including SRM, instrumentation audits, imbalance and segmentation checks.

How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a medium difficulty Analytics & Experimentation question, commonly asked during HR Screen rounds at Thumbtack.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Thumbtack during technical interviews.

Explain power drivers and resolve unexpected A/B results

A/B Testing: Power, Sample Size, Allocation, and Diagnostics

You are analyzing a two-proportion (binary conversion) A/B test with independent users, no clustering/spillover, and equal exposure eligibility per day unless specified. Answer all parts concisely and show calculations where requested.

(a) Define Power and Rank the Levers

Define statistical power for a two-proportion A/B test and list the primary levers that increase power, ranking them by typical practical impact (largest to smallest). Briefly explain trade-offs. Include:

Effect size (MDE)
Variance/metric volatility
Sample size
Allocation ratio
Alpha
Variance-reduction (e.g., CUPED)
Bucketing/stratification
Test duration/seasonality

(b) Baseline Scenario: Sample Size and Duration

Given:

Baseline conversion p0 = 5.00%
Target relative lift = +7% ⇒ p1 = 5.35% (Δ = 0.35 pp)
Two-sided alpha = 0.05
Desired power = 0.80
Equal allocation (50/50)
80,000 eligible users per day
10% post-randomization attrition (i.e., only 90% produce analyzable outcomes)

Compute:

Required sample size per variant (analyzable users)
Minimum test duration (days)

Show formulas and numeric results.

(c) With CUPED (R² = 0.30)

Recompute (b) assuming CUPED achieves a 30% relative variance reduction (R² = 0.30). What is the new sample size per variant and duration?

(d) Unequal Allocation 90/10

How does switching to a 90/10 allocation (90% control, 10% treatment) affect power at fixed total traffic? Provide intuition and, if possible, a quantitative comparison to equal split.

(e) Negative Significant Result: Diagnostic Plan

Your test, run for the duration from (b), returns a statistically significant −2% lift (treatment worse), contrary to your prior expectation of +7%. Outline a step-by-step diagnostic plan before drawing conclusions. Include:

SRM checks (and why)
Instrumentation/metric definition audits
Bot/geo/device imbalance
Novelty/learning effects
Outlier clipping
Simpson’s paradox via key segments
Guardrail metrics
Peeking/stopping risk

(f) Decision Tree After Diagnostics

Propose an evidence-based decision tree for what to do next. Specify when to:

(i) Ship
(ii) Iterate with a follow-up test (name one concrete design change to test)
(iii) Rerun (state the precise condition that justifies a rerun)

A/B Testing: Power, Sample Size, Allocation, and Diagnostics

(a) Define Power and Rank the Levers

Effect size (MDE)
Variance/metric volatility
Sample size
Allocation ratio
Alpha
Variance-reduction (e.g., CUPED)
Bucketing/stratification
Test duration/seasonality

(b) Baseline Scenario: Sample Size and Duration

Given:

Baseline conversion p0 = 5.00%
Target relative lift = +7% ⇒ p1 = 5.35% (Δ = 0.35 pp)
Two-sided alpha = 0.05
Desired power = 0.80
Equal allocation (50/50)
80,000 eligible users per day
10% post-randomization attrition (i.e., only 90% produce analyzable outcomes)

Compute:

Required sample size per variant (analyzable users)
Minimum test duration (days)

Show formulas and numeric results.

(c) With CUPED (R² = 0.30)

Recompute (b) assuming CUPED achieves a 30% relative variance reduction (R² = 0.30). What is the new sample size per variant and duration?

(d) Unequal Allocation 90/10

How does switching to a 90/10 allocation (90% control, 10% treatment) affect power at fixed total traffic? Provide intuition and, if possible, a quantitative comparison to equal split.

(e) Negative Significant Result: Diagnostic Plan

SRM checks (and why)
Instrumentation/metric definition audits
Bot/geo/device imbalance
Novelty/learning effects
Outlier clipping
Simpson’s paradox via key segments
Guardrail metrics
Peeking/stopping risk

(f) Decision Tree After Diagnostics

Propose an evidence-based decision tree for what to do next. Specify when to:

(i) Ship
(ii) Iterate with a follow-up test (name one concrete design change to test)
(iii) Rerun (state the precise condition that justifies a rerun)

Explain power drivers and resolve unexpected A/B results

Quick Overview

Explain power drivers and resolve unexpected A/B results

A/B Testing: Power, Sample Size, Allocation, and Diagnostics

(a) Define Power and Rank the Levers

(b) Baseline Scenario: Sample Size and Duration

(c) With CUPED (R² = 0.30)

(d) Unequal Allocation 90/10

(e) Negative Significant Result: Diagnostic Plan

(f) Decision Tree After Diagnostics

Write your answer

Explain power drivers and resolve unexpected A/B results

Quick Overview

Explain power drivers and resolve unexpected A/B results

A/B Testing: Power, Sample Size, Allocation, and Diagnostics

(a) Define Power and Rank the Levers

(b) Baseline Scenario: Sample Size and Duration

(c) With CUPED (R² = 0.30)

(d) Unequal Allocation 90/10

(e) Negative Significant Result: Diagnostic Plan

(f) Decision Tree After Diagnostics

Write your answer