This question evaluates a data scientist's competency in linear regression diagnostics and remedial modeling, covering core OLS assumptions (linearity, no perfect multicollinearity, exogeneity, homoskedasticity, error independence and normality), diagnostics and remedies for heteroskedasticity and severe multicollinearity, and method selection among ridge/LASSO and GLMs. Commonly asked in the Machine Learning and statistical modeling domain because it probes detection of assumption violations, interpretation of impacts on standard errors, confidence intervals and hypothesis tests, and the balance between conceptual inference and practical model refitting, assessing both conceptual understanding and practical application.
You are fitting a linear regression with Ordinary Least Squares (OLS) on a large cross-sectional dataset (n = 10,000). Answer the following:
List the standard OLS assumptions required for unbiased, efficient, and consistent estimates:
For each assumption, provide:
Examples of diagnostics: residual plots, VIF, White/Breusch–Pagan, Durbin–Watson, RESET. Examples of remedies: transformations, robust/clustered SEs, regularization, GLMs.
Given:
Outline exact steps to:
When would you switch from OLS to ridge/LASSO or to a GLM, and why?
Login required