Handle missing values for LGD modeling

Q: Handle missing values for LGD modeling

This is a Machine Learning interview question from Citibank for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Handling Missing Values for LGD Modeling

Context

You are building a Loss Given Default (LGD) model using account- and borrower-level features captured around the time of default. The dataset contains both continuous and categorical variables with non-trivial missingness due to reporting gaps, system migrations, and process differences across products/regions.

Task

Describe how you would handle missing values in this LGD modeling context. Specifically, compare the following approaches:

Multiple imputation (e.g., MICE)
Model-based imputation (e.g., kNN, random forest, regression)
Business-rule fills (domain-driven heuristics)
Indicator variables (missingness flags; Unknown category)
Leaving missingness explicit (letting the model handle NA directly)

For each, explain:

Assumptions about the missingness mechanism (MCAR, MAR, MNAR)
Pros, cons, and typical use cases in LGD modeling
Guardrails to avoid bias and leakage
How you would validate the choice and measure impact on LGD performance and stability

Handle missing values for LGD modeling

Handling Missing Values for LGD Modeling

Context

Task

Solution (Locked)

Comments (0)