PracHub
QuestionsPremiumLearningGuidesInterview PrepCoaches
|Home/Machine Learning/Citadel

Diagnose outliers and influence in linear regression

Last updated: May 21, 2026

Quick Overview

This question evaluates a data scientist's competency in diagnosing outliers, high-leverage points, influential observations, and Cook's distance in ordinary least squares regression, including recognition of robust alternatives and rationale for reporting model-review decisions.

  • hard
  • Citadel
  • Machine Learning
  • Data Scientist

Diagnose outliers and influence in linear regression

Company: Citadel

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

In ordinary least squares, define outliers, high-leverage points, and influential observations. Derive Cook’s distance and explain its relationship to leverage and studentized residuals. Then: (1) Describe a step-by-step diagnostic workflow (plots and statistics) to detect each. (2) Show how conclusions can change if you remove the top-1 influential point, and propose robust alternatives (e.g., Huber, Tukey biweight, RANSAC). (3) Explain how to report and justify decisions about such points in a model review.

Quick Answer: This question evaluates a data scientist's competency in diagnosing outliers, high-leverage points, influential observations, and Cook's distance in ordinary least squares regression, including recognition of robust alternatives and rationale for reporting model-review decisions.

Related Interview Questions

  • Analyze Correlations and Generate Gaussians - Citadel (medium)
  • Determine When a Quadratic Has Finite Minimum - Citadel (medium)
  • Choose models for trading tasks - Citadel (hard)
  • Estimate OLS via streaming sufficient statistics - Citadel (hard)
  • Design city home-price prediction system - Citadel (hard)
Citadel logo
Citadel
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
5
0

OLS Diagnostics: Outliers, Leverage, Influence, and Cook's Distance

Context

You are fitting an ordinary least squares (OLS) linear regression with an intercept. Let X be the n×p design matrix (p includes the intercept), y the response, and the OLS fit is ŷ = Xβ̂ with residuals e = y − ŷ.

Tasks

  1. Define outliers, high-leverage points, and influential observations in OLS.
  2. Derive Cook's distance and explain how it relates to leverage and (studentized) residuals.
  3. Provide a step-by-step diagnostic workflow (plots and statistics) to detect each of the above.
  4. Demonstrate how conclusions can change if you remove the top-1 influential point, and propose robust alternatives (e.g., Huber, Tukey biweight, RANSAC).
  5. Explain how to report and justify decisions about such points in a model review.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.