This question evaluates a data scientist's understanding of missing-data mechanisms and imputation strategies in Loss Given Default (LGD) credit-risk modeling, including recognition of MCAR/MAR/MNAR assumptions, sources of bias and leakage, and validation approaches for performance and stability.
You are building a Loss Given Default (LGD) model using account- and borrower-level features captured around the time of default. The dataset contains both continuous and categorical variables with non-trivial missingness due to reporting gaps, system migrations, and process differences across products/regions.
Describe how you would handle missing values in this LGD modeling context. Specifically, compare the following approaches:
For each, explain:
Login required