Explain multicollinearity and OLS assumptions
Company: Citadel
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Technical Screen
In linear regression:
1) List and explain the OLS assumptions (linearity, independence/no autocorrelation, homoscedasticity, normality of errors for inference, no perfect multicollinearity, correct specification).
2) Define multicollinearity and describe its effects on coefficient variance, stability, confidence intervals, and p-values while noting that OLS point estimates remain unbiased.
3) Show how to diagnose multicollinearity (correlation matrix, VIF thresholds, condition number, eigenvalue analysis).
4) Propose remedies (collect more data, drop/combine features, center variables and interaction terms, ridge/LASSO/elastic net, PCA/PLS) and discuss their trade-offs.
5) If two predictors are perfectly collinear, what happens to X'X and how do implementations typically handle it?
Quick Answer: This question evaluates understanding of ordinary least squares (OLS) assumptions and multicollinearity, covering estimator properties, diagnostic metrics, and implications for the design matrix as core competencies in statistical modeling and numerical linear algebra within the Statistics & Math domain for Data Scientist roles.