Diagnose and fix linear regression assumption breaks

Q: Diagnose and fix linear regression assumption breaks

This question evaluates a data scientist's competency in linear regression diagnostics and remedial modeling, covering core OLS assumptions (linearity, no perfect multicollinearity, exogeneity, homoskedasticity, error independence and normality), diagnostics and remedies for heteroskedasticity and severe multicollinearity, and method selection among ridge/LASSO and GLMs. Commonly asked in the Machine Learning and statistical modeling domain because it probes detection of assumption violations, interpretation of impacts on standard errors, confidence intervals and hypothesis tests, and the balance between conceptual inference and practical model refitting, assessing both conceptual understanding and practical application.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

OLS Assumptions, Diagnostics, Remedies, and Refitting Under Heteroskedasticity and Multicollinearity

You are fitting a linear regression with Ordinary Least Squares (OLS) on a large cross-sectional dataset (n = 10,000). Answer the following:

1) Core OLS Assumptions

List the standard OLS assumptions required for unbiased, efficient, and consistent estimates:

Linearity / correct specification
No perfect multicollinearity
Exogeneity: E[ε | X] = 0
Homoskedasticity (constant variance)
No autocorrelation / independence of errors
Normality of errors (only needed for exact finite-sample t/F inference)

For each assumption, provide:

One concrete diagnostic
One concrete remedy

Examples of diagnostics: residual plots, VIF, White/Breusch–Pagan, Durbin–Watson, RESET. Examples of remedies: transformations, robust/clustered SEs, regularization, GLMs.

2) Scenario: Heteroskedasticity and Multicollinearity

Given:

n = 10,000
Var(ε | X) ∝ x1² (heteroskedasticity driven by x1)
corr(x2, x3) = 0.98 (severe multicollinearity)

Outline exact steps to:

Validate assumptions
Refit models with appropriate fixes
Compare models
Describe expected changes to standard errors, confidence intervals, and hypothesis tests

3) Method Choice

When would you switch from OLS to ridge/LASSO or to a GLM, and why?

Diagnose and fix linear regression assumption breaks

OLS Assumptions, Diagnostics, Remedies, and Refitting Under Heteroskedasticity and Multicollinearity

1) Core OLS Assumptions

2) Scenario: Heteroskedasticity and Multicollinearity

3) Method Choice

Solution

Comments (0)

Diagnose and fix linear regression assumption breaks

Overview

OLS Assumptions, Diagnostics, Remedies, and Refitting Under Heteroskedasticity and Multicollinearity

1) Core OLS Assumptions

2) Scenario: Heteroskedasticity and Multicollinearity

3) Method Choice

Solution

Comments (0)