Explain power drivers and resolve unexpected A/B results
Company: Thumbtack
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: HR Screen
Answer all parts concisely and with calculations where requested. (a) Define statistical power for a two-proportion A/B test and list the primary levers that increase power, ranking them by typical practical impact (largest to smallest) and briefly explaining trade-offs. Include: effect size (MDE), variance/metric volatility, sample size, allocation ratio, alpha, variance-reduction (e.g., CUPED), bucketing/stratification, and test duration/seasonality. (b) Scenario: Baseline conversion p0 = 5.0%. Target relative lift = +7% (p1 = 5.35%). Two-sided alpha = 0.05, desired power = 0.80, equal allocation, independent users, no clustering. Compute the required sample size per variant and the minimum test duration (days) if you receive 80,000 eligible users/day with an expected 10% post-randomization attrition. Show formulas and numeric results. (c) Recompute part (b) assuming CUPED with R^2 = 0.30 (i.e., a 30% relative variance reduction). What is the new sample size per variant and duration? (d) How does switching to a 90/10 allocation (90% control, 10% treatment) affect power at fixed total traffic? Provide intuition and, if possible, a quantitative comparison to equal split. (e) Your test, run for the duration from (b), returns a statistically significant −2% lift (treatment worse) contrary to your prior expectation of +7%. Outline a step-by-step diagnostic plan before drawing conclusions: include SRM checks (and why), instrumentation/metric definition audits, bot/geo/device imbalance, novelty/learning effects, outlier clipping, Simpson’s paradox via key segments, guardrail metrics, and peeking/stopping risk. (f) After diagnostics, propose an evidence-based decision tree: when to (i) ship, (ii) iterate with a follow-up test (specify one design change), or (iii) rerun (state the precise condition that justifies a rerun).
Quick Answer: This question evaluates a data scientist's mastery of A/B testing fundamentals — statistical power and sample-size calculations, effect-size and variance considerations, allocation strategies, variance-reduction methods such as CUPED, and experiment diagnostics including SRM, instrumentation audits, imbalance and segmentation checks.