Analyze A/B test with rigorous diagnostics
Company: Airbnb
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You receive experiment.csv with columns: user_id, variant ∈ {A,B}, assign_ts (UTC), saw_treatment (0/1), country, device, pre_metric (baseline), active_minutes_d7, paid_d7 (0/1), revenue_d7, sessions_d7, crashes_d7. Using Python, do the following live: (1) verify randomization via covariate balance tests and visualizations; (2) define and justify the primary metric and guardrails; (3) compute ITT for the primary metric with 95% CIs using both analytic (CLT with cluster-robust SE at user level) and bootstrap; (4) apply CUPED using pre_metric and report variance reduction; (5) handle noncompliance by estimating CACE via 2SLS (variant→saw_treatment as instrument), and discuss IV assumptions and diagnostics; (6) check heterogeneity by country and device with multiple-testing control (e.g., BH); (7) assess power and MDE given observed variance and sample size; (8) evaluate sequential peeking risk and show how a spending function or alpha-adjusted boundary would change conclusions; (9) produce plots (ECDFs, quantile treatment effects, covariate-binned effects) to support findings; (10) recommend ship/no-ship and call out the top two residual risks.
Quick Answer: This question evaluates a data scientist's competency in experimental design and rigorous A/B test analysis, including covariate balance checks, primary metric definition and guardrails, intent-to-treat estimation with analytic and bootstrap confidence intervals, variance reduction via CUPED, instrumental-variable estimation for CACE (2SLS), subgroup heterogeneity with multiple-testing control, power/MDE assessment, sequential testing diagnostics, and visualization of treatment effects. Commonly asked in Analytics & Experimentation interviews, it assesses both conceptual understanding of causal inference and statistical diagnostics and practical application skills in implementing robust A/B test analyses and interpreting diagnostic outputs, with the domain focused on applied experimentation and the level of abstraction spanning conceptual understanding and hands-on practical application.