This question evaluates understanding of regression modeling with interaction terms, binary outcome models and odds/marginal-effect interpretation, causal inference concepts like endogeneity and instrumental variables, model diagnostics and standard-error choices, and count-data modeling.

You have user-level data with a binary outcome retained_7d (1 if the user is active 7 days after signup; 0 otherwise). Covariates include:
The model includes an interaction term treated × new_user.
Answer the following:
Discuss assumptions and diagnostics you would check:
What standard errors would you report and why (HC-robust vs clustered)? Justify your cluster choice.
Suppose watch_time_day1 is endogenous (e.g., driven by unobserved preference). Propose two remedies and their assumptions:
What instruments or proxies could be plausible here?
You also have count data for daily videos watched. When would Poisson or negative binomial be preferred over OLS? How would you check overdispersion and interpret the exponentiated coefficients?
Login required