You are given a single dataset (CSV) from an A/B experiment on a streaming product. The goal is to estimate the causal effect of a personalization feature on minutes streamed.
Data
Assume one row per user with the following columns:
-
user_id
(string/int): unique user identifier
-
assigned
(0/1): randomized assignment (instrument) to personalization (1) vs control (0)
-
personalized
(0/1): whether the user actually received personalization (there may be non-compliance)
-
minutes_streamed
(float): outcome measured over a fixed window after assignment
-
x1, x2, ...
(optional covariates): additional user features (some may be irrelevant)
Tasks
-
ATE (Average Treatment Effect)
: Estimate the causal effect of
receiving personalization
on
minutes_streamed
.
-
State clearly what assumptions you are using (e.g., full compliance vs. unconfoundedness vs. random assignment).
-
Provide an estimator and how you would compute it in Python/R.
-
ITT and TOT
(with non-compliance):
-
Estimate
ITT
: the effect of being
assigned
to personalization (
assigned
) on
minutes_streamed
.
-
Estimate
TOT
(a.k.a. LATE for compliers): the effect of
actually receiving personalization
(
personalized
) using
assigned
as an instrument.
-
Report the formulas and how to compute them.
-
IV/LATE conceptual checks
:
-
Under what conditions is the IV/TOT estimate a valid causal effect?
-
What population does it apply to (e.g., compliers/always-takers/never-takers)?
-
What breaks if the exclusion restriction or monotonicity fails?