A/B Test ITT Unbiasedness and Remedies Under Noncompliance, Missing Logs, Interference, and Early Stopping
Setup
-
Design: User-level 50/50 A/B test starting Aug 1, 2025.
-
Primary metric: Average 14-day engagement minutes per assigned user (users with zero usage contribute 0).
-
Observed issues:
-
Noncompliance: 20% of Treatment never see the feature; 5% of Control get it (leakage).
-
Missing data: 8% of sessions fail to log due to a bug over-represented on Android 13; conditional on device and region, logging is independent of treatment.
-
Interference: Viral invites cause 10% of Control users to interact directly with treated users.
-
Early stopping: Stopped on day 7 after observing p < 0.05, even though the metric is defined over 14 days.
Let Z ∈ {0,1} denote assignment (0=Control, 1=Treatment). The estimator of interest is the difference in mean observed outcomes by assignment, i.e., E[Ŷ | Z=1] − E[Ŷ | Z=0], where Ŷ is the measured 14-day minutes.
Tasks
(a) State the precise assumptions under which this estimator is unbiased for the intent-to-treat (ITT) effect on 14-day engagement.
(b) For each issue (1)–(4), say whether the ITT estimator remains unbiased; if bias arises, state its likely direction and why.
(c) Propose a concrete analysis plan to recover an unbiased or approximately unbiased estimate of the treatment effect on 14-day engagement. Include handling of:
-
Noncompliance (e.g., report ITT; optionally estimate TOT via IV with assignment as instrument and list assumptions),
-
Missing logs (e.g., IPW or multiple imputation using device/region strata),
-
Interference (e.g., cluster-level or exposure-based adjustment/sensitivity analysis),
-
Early stopping (e.g., pre-specified alpha spending or sequential corrections).
(d) List specific diagnostic checks to support assumptions (balance checks, missingness patterns by strata, spillover detection, robustness/sensitivity analyses).