PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Statistics & Math/Amazon

Analyze omitted-variable bias in regression

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of omitted-variable bias and heteroscedasticity in linear regression, testing competency in how omitting a relevant covariate and having non-constant error variance impact OLS point estimation and inference.

  • medium
  • Amazon
  • Statistics & Math
  • Data Scientist

Analyze omitted-variable bias in regression

Company: Amazon

Role: Data Scientist

Category: Statistics & Math

Difficulty: medium

Interview Round: Onsite

Suppose the true data-generating process for building energy use is Y = β0 + β1·Temp + β2·Occupancy + ε with E[ε|Temp,Occupancy]=0 and Var(ε|Temp,Occupancy)=σ²(Temp), i.e., heteroscedastic. You mistakenly fit OLS on Y ~ Temp only. Derive the expected bias of the OLS estimator for β1 in terms of Cov(Temp,Occupancy) and Var(Temp), state the sign of the bias when hotter days are more occupied, and explain how weighted least squares or robust SEs affect estimation vs inference here. Finally, design a test to detect remaining heteroscedasticity (name the test and null) and propose a practical diagnostic you would include in a production training report.

Quick Answer: This question evaluates understanding of omitted-variable bias and heteroscedasticity in linear regression, testing competency in how omitting a relevant covariate and having non-constant error variance impact OLS point estimation and inference.

Related Interview Questions

  • Compute an A/B test p-value by hand - Amazon (medium)
  • Compute and interpret quantile loss vs RMSE - Amazon (medium)
  • Compute CIs, power, and multiple testing - Amazon (medium)
  • Plan and analyze an A/B test - Amazon (hard)
  • Compute p-values, CIs, and adjust multiples - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Statistics & Math
5
0

Omitted-Variable Bias, Heteroscedasticity, and Remedies

Setup

  • True data-generating process (DGP): Y = β0 + β1·Temp + β2·Occupancy + ε
  • Assumptions: E[ε | Temp, Occupancy] = 0 and Var(ε | Temp, Occupancy) = σ²(Temp) (heteroscedasticity depending on Temp).
  • Mistake: You fit OLS on Y ~ Temp only (Occupancy omitted).

Tasks

  1. Derive the expected (asymptotic) bias of the OLS estimator of β1 when regressing Y on Temp only. Express it in terms of Cov(Temp, Occupancy) and Var(Temp).
  2. State the sign of the bias when hotter days are more occupied.
  3. Explain how weighted least squares (WLS) and heteroscedasticity-robust standard errors (robust SEs) affect estimation vs inference in this setting.
  4. Design a test to detect remaining heteroscedasticity: name the test and its null hypothesis.
  5. Propose a practical heteroscedasticity diagnostic you would include in a production training report.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.