This question evaluates a data scientist's competency in causal inference and experimental analysis, specifically the estimation and interpretation of ATE, ITT, and TOT/LATE from randomized trials with possible non-compliance using instrumental variables.
You are given a single dataset (CSV) from an A/B experiment on a streaming product. The goal is to estimate the causal effect of a personalization feature on minutes streamed.
Assume one row per user with the following columns:
user_id
(string/int): unique user identifier
assigned
(0/1): randomized assignment (instrument) to personalization (1) vs control (0)
personalized
(0/1): whether the user actually received personalization (there may be non-compliance)
minutes_streamed
(float): outcome measured over a fixed window after assignment
x1, x2, ...
(optional covariates): additional user features (some may be irrelevant)
minutes_streamed
.
assigned
) on
minutes_streamed
.
personalized
) using
assigned
as an instrument.