Design an A/B test with causal inference
Company: Airbnb
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You own experimentation for an e-commerce checkout nudge. Design an A/B test randomized at the guest_id level and run for 28 days (2025-08-04 to 2025-08-31). Primary metric: completed order within 7 days of first exposure; guardrails: bounce rate and p95 page latency. Baseline 7-day per-guest conversion is 5%; minimum detectable relative lift is 8%; two-sided α=0.05; power=0.80. Average 1.6 sessions per guest with ICC=0.05. Constraints: repeat visitors across devices, 5% bot traffic, some cookie resets causing cross-arm contamination. Answer: 1) Define the estimand (ITT vs TOT) and justify the unit (guest vs session) and exposure definition with cross-device deduping and noncompliance. 2) Compute required per-arm sample size accounting for clustering (show the design effect and final n per arm). 3) Specify SRM and integrity checks (e.g., device/geo imbalance, traffic-source mix), how to detect, and how to remediate. 4) If randomization fails and you only have pre/post windows (pre: 2025-07-01–2025-07-31; post: 2025-09-01–2025-09-30), formulate a credible causal strategy (e.g., DiD with covariates/CUPED or PSM/IPW): state the identifying assumptions, write the ATE estimator, and describe how you’d test parallel trends and overlap. 5) Address interference/novelty and propose a sequential monitoring plan that controls Type I error (e.g., O’Brien–Fleming boundaries) and a plan for early stopping for harm. 6) Suppose the experiment ends with control conv=5.0% (n=120,000) and treatment conv=5.6% (n=120,000). Compute the lift, its standard error/95% CI (properly accounting for two-proportion comparison), and interpret both statistical and practical significance; would you ship under these constraints?
Quick Answer: This question evaluates skills in experimental design, causal inference, and applied statistics — including estimand selection, sample-size calculation under clustering, integrity monitoring, handling noncompliance and contamination, sequential monitoring, and two-proportion inference — within the Analytics & Experimentation domain for a Data Scientist role. It is commonly asked because interviewers need to assess the ability to design and analyze robust A/B tests under real-world constraints; the prompt requires both conceptual understanding of causal assumptions and practical application of power calculations, diagnostics, and monitoring procedures.