PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Citibank

Explain PD model validation steps

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in PD model validation and credit risk analytics, covering data partitioning and leakage control, discrimination and calibration assessment, stability and drift monitoring, backtesting, challenger governance, and model risk documentation.

  • medium
  • Citibank
  • Machine Learning
  • Data Scientist

Explain PD model validation steps

Company: Citibank

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

Explain how you would validate a newly developed PD model. Discuss data partitioning, discrimination metrics (e.g., AUC, KS), calibration checks, stability monitoring (e.g., PSI), backtesting, challenger models, and documentation.

Quick Answer: This question evaluates competency in PD model validation and credit risk analytics, covering data partitioning and leakage control, discrimination and calibration assessment, stability and drift monitoring, backtesting, challenger governance, and model risk documentation.

Related Interview Questions

  • Diagnose and fix linear regression assumption breaks - Citibank (medium)
  • Handle missing values for LGD modeling - Citibank (medium)
  • Discuss logistic regression limitations for PD - Citibank (medium)
  • Identify top exposures and mitigate - Citibank (medium)
  • Compute EL and RWA from loan data - Citibank (medium)
Citibank logo
Citibank
Jul 26, 2025, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
3
0

Validate a Newly Developed Probability of Default (PD) Model

Context

Assume you have built a retail credit Probability of Default (PD) model with a 12‑month default horizon using historical applications and realized default outcomes. You are asked to outline how you would validate this model before deployment and set up ongoing monitoring.

Task

Describe a practical, end‑to‑end validation plan that covers:

  1. Data partitioning and leakage control (including time-based splits and class imbalance handling).
  2. Discrimination metrics and interpretation (e.g., AUC/ROC, Gini, KS, PR‑AUC, lift).
  3. Calibration checks and fixes (e.g., Brier score, reliability curves, intercept/slope tests, Hosmer–Lemeshow, recalibration methods).
  4. Stability monitoring for drift (e.g., PSI/CSI, segmentation, thresholds, triggers).
  5. Backtesting against realized defaults over time (e.g., E/O by bands, statistical tests, vintages).
  6. Challenger models and champion–challenger governance.
  7. Documentation and controls for model risk management.

Be explicit about key assumptions, typical thresholds, common pitfalls, and how you would validate results statistically. Where helpful, include small numeric examples or formulas.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Citibank•More Data Scientist•Citibank Data Scientist•Citibank Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.