Analyze omitted-variable bias in regression
Company: Amazon
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Onsite
Suppose the true data-generating process for building energy use is Y = β0 + β1·Temp + β2·Occupancy + ε with E[ε|Temp,Occupancy]=0 and Var(ε|Temp,Occupancy)=σ²(Temp), i.e., heteroscedastic. You mistakenly fit OLS on Y ~ Temp only. Derive the expected bias of the OLS estimator for β1 in terms of Cov(Temp,Occupancy) and Var(Temp), state the sign of the bias when hotter days are more occupied, and explain how weighted least squares or robust SEs affect estimation vs inference here. Finally, design a test to detect remaining heteroscedasticity (name the test and null) and propose a practical diagnostic you would include in a production training report.
Quick Answer: This question evaluates understanding of omitted-variable bias and heteroscedasticity in linear regression, testing competency in how omitting a relevant covariate and having non-constant error variance impact OLS point estimation and inference.