Context
You are reviewing an observational study that used Propensity Score Matching (PSM) to estimate the causal impact of a UI change on user watch time. Randomized experimentation was not feasible, so historical logs and user covariates were leveraged to construct a matched sample and estimate an ATT (average treatment effect on the treated).
Task
Answer the following about best practices in PSM for product analytics:
-
Why is standardized mean difference (SMD) ≤ 0.1 often used as a post-matching balance threshold?
-
If logistic regression is not appropriate for the propensity model, what alternatives would you consider and why?
-
How would you diagnose residual confounding after matching?
-
Describe one method to estimate the variance of the treatment effect under PSM and when it is appropriate.
Hint: Discuss overlap/positivity, balance diagnostics (including higher moments and transformations), causal estimands (ATE vs ATT), and robust variance estimation.