A/B Test Analysis Live Walkthrough (Python)
Context
You are given a user-level randomized experiment dataset experiment.csv with columns:
-
user_id
-
variant ∈ {A, B}
-
assign_ts (UTC timestamp)
-
saw_treatment (0/1; whether the user actually saw the treatment)
-
country (categorical)
-
device (categorical)
-
pre_metric (pre-experiment baseline metric)
-
active_minutes_d7
-
paid_d7 (0/1)
-
revenue_d7
-
sessions_d7
-
crashes_d7
Assumptions:
-
One row per unique user (if duplicates exist, keep the earliest assign_ts per user).
-
Randomization occurred at the user level.
-
Outcomes are 7-day metrics post-assignment.
Tasks
Using Python, do the following:
-
Verify randomization via covariate balance tests and visualizations.
-
Define and justify the primary metric and guardrails.
-
Compute the ITT (intent-to-treat) for the primary metric with 95% CIs using both:
-
Analytic normal approximation (CLT) with cluster-robust SE at the user level.
-
Bootstrap (stratified by variant).
-
Apply CUPED using pre_metric and report variance reduction.
-
Handle noncompliance by estimating CACE via 2SLS (instrument: variant → saw_treatment). Discuss IV assumptions and diagnostics.
-
Check heterogeneity by country and device with multiple-testing control (e.g., Benjamini–Hochberg).
-
Assess power and MDE given observed variance and sample size.
-
Evaluate sequential peeking risk and show how a spending function or alpha-adjusted boundary would change conclusions.
-
Produce plots (ECDFs, quantile treatment effects, covariate-binned effects) to support findings.
-
Recommend ship/no-ship and call out the top two residual risks.