This question evaluates proficiency in regression diagnostics and model selection for count outcomes, including OLS assumption checks, log-transformation back-transformation and coefficient interpretation, heteroskedasticity testing and robust standard errors, multicollinearity (VIF), autocorrelation, and the choice between OLS and Poisson/Negative Binomial GLMs; it falls under Statistics & Math for Data Scientist roles and tests both conceptual understanding and practical application of statistical modeling. Such questions are commonly asked to assess a candidate's ability to validate model assumptions, interpret transformed and categorical effects, and justify appropriate modeling choices based on diagnostic evidence, reflecting the statistical reasoning needed in real-world data science work.
You are given a cleaned dataset with the following columns:
Task: Using Python and statsmodels, draw a 100,000-row sample without replacement and fit an OLS model to predict signups using spend, clicks, cpc, and region dummies. Then:
Provide minimal code necessary to reproduce these diagnostics.
Login required