PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Citibank

Handle missing values for LGD modeling

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's understanding of missing-data mechanisms and imputation strategies in Loss Given Default (LGD) credit-risk modeling, including recognition of MCAR/MAR/MNAR assumptions, sources of bias and leakage, and validation approaches for performance and stability.

  • medium
  • Citibank
  • Machine Learning
  • Data Scientist

Handle missing values for LGD modeling

Company: Citibank

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

How would you handle missing values in a dataset used for LGD modeling? Compare multiple imputation, model‑based imputation, business‑rule fills, indicator variables, and conditions where leaving missingness explicit is preferable.

Quick Answer: This question evaluates a data scientist's understanding of missing-data mechanisms and imputation strategies in Loss Given Default (LGD) credit-risk modeling, including recognition of MCAR/MAR/MNAR assumptions, sources of bias and leakage, and validation approaches for performance and stability.

Related Interview Questions

  • Diagnose and fix linear regression assumption breaks - Citibank (medium)
  • Discuss logistic regression limitations for PD - Citibank (medium)
  • Identify top exposures and mitigate - Citibank (medium)
  • Compute EL and RWA from loan data - Citibank (medium)
  • Explain PD model validation steps - Citibank (medium)
Citibank logo
Citibank
Jul 26, 2025, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
2
0

Handling Missing Values for LGD Modeling

Context

You are building a Loss Given Default (LGD) model using account- and borrower-level features captured around the time of default. The dataset contains both continuous and categorical variables with non-trivial missingness due to reporting gaps, system migrations, and process differences across products/regions.

Task

Describe how you would handle missing values in this LGD modeling context. Specifically, compare the following approaches:

  1. Multiple imputation (e.g., MICE)
  2. Model-based imputation (e.g., kNN, random forest, regression)
  3. Business-rule fills (domain-driven heuristics)
  4. Indicator variables (missingness flags; Unknown category)
  5. Leaving missingness explicit (letting the model handle NA directly)

For each, explain:

  • Assumptions about the missingness mechanism (MCAR, MAR, MNAR)
  • Pros, cons, and typical use cases in LGD modeling
  • Guardrails to avoid bias and leakage
  • How you would validate the choice and measure impact on LGD performance and stability

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Citibank•More Data Scientist•Citibank Data Scientist•Citibank Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.