Analyze omitted-variable bias in regression | Amazon Interview Question
Analyze omitted-variable bias in regression
Quick Overview
This question evaluates understanding of omitted-variable bias and heteroscedasticity in linear regression, testing competency in how omitting a relevant covariate and having non-constant error variance impact OLS point estimation and inference.
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Statistics & Math
3
0
Omitted-Variable Bias, Heteroscedasticity, and Remedies
Setup
True data-generating process (DGP):
Y = β0 + β1·Temp + β2·Occupancy + ε
Assumptions: E[ε | Temp, Occupancy] = 0 and Var(ε | Temp, Occupancy) = σ²(Temp) (heteroscedasticity depending on Temp).
Mistake: You fit OLS on Y ~ Temp only (Occupancy omitted).
Tasks
Derive the expected (asymptotic) bias of the OLS estimator of β1 when regressing Y on Temp only. Express it in terms of Cov(Temp, Occupancy) and Var(Temp).
State the sign of the bias when hotter days are more occupied.
Explain how weighted least squares (WLS) and heteroscedasticity-robust standard errors (robust SEs) affect estimation vs inference in this setting.
Design a test to detect remaining heteroscedasticity: name the test and its null hypothesis.
Propose a practical heteroscedasticity diagnostic you would include in a production training report.