PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Analytics & Experimentation/Airbnb

Analyze A/B test with rigorous diagnostics

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in experimental design and rigorous A/B test analysis, including covariate balance checks, primary metric definition and guardrails, intent-to-treat estimation with analytic and bootstrap confidence intervals, variance reduction via CUPED, instrumental-variable estimation for CACE (2SLS), subgroup heterogeneity with multiple-testing control, power/MDE assessment, sequential testing diagnostics, and visualization of treatment effects. Commonly asked in Analytics & Experimentation interviews, it assesses both conceptual understanding of causal inference and statistical diagnostics and practical application skills in implementing robust A/B test analyses and interpreting diagnostic outputs, with the domain focused on applied experimentation and the level of abstraction spanning conceptual understanding and hands-on practical application.

  • hard
  • Airbnb
  • Analytics & Experimentation
  • Data Scientist

Analyze A/B test with rigorous diagnostics

Company: Airbnb

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

You receive experiment.csv with columns: user_id, variant ∈ {A,B}, assign_ts (UTC), saw_treatment (0/1), country, device, pre_metric (baseline), active_minutes_d7, paid_d7 (0/1), revenue_d7, sessions_d7, crashes_d7. Using Python, do the following live: (1) verify randomization via covariate balance tests and visualizations; (2) define and justify the primary metric and guardrails; (3) compute ITT for the primary metric with 95% CIs using both analytic (CLT with cluster-robust SE at user level) and bootstrap; (4) apply CUPED using pre_metric and report variance reduction; (5) handle noncompliance by estimating CACE via 2SLS (variant→saw_treatment as instrument), and discuss IV assumptions and diagnostics; (6) check heterogeneity by country and device with multiple-testing control (e.g., BH); (7) assess power and MDE given observed variance and sample size; (8) evaluate sequential peeking risk and show how a spending function or alpha-adjusted boundary would change conclusions; (9) produce plots (ECDFs, quantile treatment effects, covariate-binned effects) to support findings; (10) recommend ship/no-ship and call out the top two residual risks.

Quick Answer: This question evaluates a data scientist's competency in experimental design and rigorous A/B test analysis, including covariate balance checks, primary metric definition and guardrails, intent-to-treat estimation with analytic and bootstrap confidence intervals, variance reduction via CUPED, instrumental-variable estimation for CACE (2SLS), subgroup heterogeneity with multiple-testing control, power/MDE assessment, sequential testing diagnostics, and visualization of treatment effects. Commonly asked in Analytics & Experimentation interviews, it assesses both conceptual understanding of causal inference and statistical diagnostics and practical application skills in implementing robust A/B test analyses and interpreting diagnostic outputs, with the domain focused on applied experimentation and the level of abstraction spanning conceptual understanding and hands-on practical application.

Related Interview Questions

  • Design and Analyze Airbnb Locker Experiment - Airbnb (medium)
  • Design a network-aware Wi‑Fi badge experiment - Airbnb (Medium)
  • Design an A/B test with causal inference - Airbnb (hard)
  • Design robust primary and guardrail metrics - Airbnb (hard)
  • Estimate impact of global launch without holdout - Airbnb (hard)
Airbnb logo
Airbnb
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Analytics & Experimentation
6
0
Loading...

A/B Test Analysis Live Walkthrough (Python)

Context

You are given a user-level randomized experiment dataset experiment.csv with columns:

  • user_id
  • variant ∈ {A, B}
  • assign_ts (UTC timestamp)
  • saw_treatment (0/1; whether the user actually saw the treatment)
  • country (categorical)
  • device (categorical)
  • pre_metric (pre-experiment baseline metric)
  • active_minutes_d7
  • paid_d7 (0/1)
  • revenue_d7
  • sessions_d7
  • crashes_d7

Assumptions:

  • One row per unique user (if duplicates exist, keep the earliest assign_ts per user).
  • Randomization occurred at the user level.
  • Outcomes are 7-day metrics post-assignment.

Tasks

Using Python, do the following:

  1. Verify randomization via covariate balance tests and visualizations.
  2. Define and justify the primary metric and guardrails.
  3. Compute the ITT (intent-to-treat) for the primary metric with 95% CIs using both:
    • Analytic normal approximation (CLT) with cluster-robust SE at the user level.
    • Bootstrap (stratified by variant).
  4. Apply CUPED using pre_metric and report variance reduction.
  5. Handle noncompliance by estimating CACE via 2SLS (instrument: variant → saw_treatment). Discuss IV assumptions and diagnostics.
  6. Check heterogeneity by country and device with multiple-testing control (e.g., Benjamini–Hochberg).
  7. Assess power and MDE given observed variance and sample size.
  8. Evaluate sequential peeking risk and show how a spending function or alpha-adjusted boundary would change conclusions.
  9. Produce plots (ECDFs, quantile treatment effects, covariate-binned effects) to support findings.
  10. Recommend ship/no-ship and call out the top two residual risks.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Airbnb•More Data Scientist•Airbnb Data Scientist•Airbnb Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.