##### Scenario
On-site behavioral conversation on fairness in lending
##### Question
Regulators require ‘fair lending’. The portfolio currently issues 50 % of loans to women and 50 % to men. Does this guarantee fairness? Explain the additional analyses you would perform. Why are you interested in machine learning? Describe a past failure and what you would do differently.
##### Hints
Discuss qualified-applicant mix, acceptance rates, pricing parity, disparate impact tests.
Quick Answer: This question evaluates a data scientist's understanding of fair lending principles, statistical fairness metrics, funnel and selection-rate analyses, ethical judgment in model-driven decisions, and behavioral leadership communication within the Behavioral & Leadership category.
Solution
## 1) Does a 50/50 issuance split guarantee fair lending?
Short answer: No. A 50% women / 50% men share of funded loans does not, by itself, establish fairness. It can mask unfairness at multiple points in the funnel:
- Applicant mix vs. approvals: If 70% of applicants are women but only 50% of funded loans go to women, women might be under-approved.
- Selection rates: Even with a 50/50 funded split, acceptance rates (approvals per applicant) could be lower for one group.
- Risk/price parity: Women could be priced higher (APR) than men after controlling for risk, which may indicate disparate treatment/impact.
- Stage-specific issues: Marketing, prequalification, underwriting, pricing, line assignment, and verification can each introduce bias.
Regulatory context: Fair lending requires avoiding disparate treatment and mitigating disparate impact unless justified by business necessity and without less discriminatory alternatives. Outcome parity alone (50/50) is not a recognized safe harbor.
---
## 2) Additional analyses to assess fairness
Think in terms of a funnel, risk adjustment, and multiple fairness metrics. Below is a pragmatic plan with formulas and small examples.
### A) Define groups, data, and segments
- Protected classes to monitor: sex/gender (example here), race/ethnicity, age, marital status, etc. Use protected attributes only for auditing, not for decisioning.
- Segment by product, channel, geography, and time window (to control confounding).
- Ensure sufficient sample size; report confidence intervals.
### B) Applicant mix vs. qualified-applicant mix
- Compute applicant share and qualified share by group. Define "qualified" using a risk threshold independent of protected class (e.g., PD < 8%).
- Example: Applicants: 6,000 women, 4,000 men. Qualified (PD < 8%): 3,000 women (50%), 3,000 men (75%). If funded loans end 50/50, women may be under-approved relative to qualification rates.
### C) Stage-by-stage funnel analysis
For each stage (impression → click → application → verified → approved → booked), compute rates by group.
- Selection (approval) rate: SR_g = approvals_g / applicants_g
- Booking rate: BR_g = funded_g / applicants_g
- Drop-off diagnostics identify where disparities originate (e.g., identity verification or income verification steps).
### D) Disparate impact test (80% rule)
- Disparate Impact Ratio (DIR) = SR_minority / SR_reference.
- Common heuristic: DIR < 0.8 may indicate disparate impact.
- Example: Women SR = 40% (400/1000), Men SR = 60% (1200/2000). DIR = 0.40 / 0.60 = 0.67 < 0.8 → potential concern. Include CIs to avoid false flags with small N.
### E) Risk-adjusted approval fairness ("equal opportunity")
- Compare approval rates conditional on creditworthiness. Methods:
1) Banding: Compare approval rates within score/PD bands.
2) Regression: Logit(Approve) = α + β·Risk + γ·Group + δ·Controls + ε. Test γ. A non-zero γ after controlling for risk/features suggests unexplained disparity.
- Metric: Equal Opportunity Difference = TPR_group − TPR_reference among truly qualified. Target near 0.
### F) Model performance parity
- AUC and Brier score by group; calibration curves by group. Look for:
- Calibration within groups: P(Default | score s) aligns with predicted PD in each group.
- Over/underestimation in one group can cause unfair thresholds/pricing.
### G) Pricing parity (risk-adjusted)
Price should reflect risk and costs, not group membership or proxies.
- Model: APR_i ≈ Base + k · ExpectedLoss_i + CostMargin.
- Test residuals: Regress APR on risk and features; check group coefficient.
APR_i = α + β·PD_i + θ·LGD_i + φ·Term/Amount + γ·Group + ε
Test H0: γ = 0.
- Small example: If women and men both have PD = 3%, LGD = 40%, but average APR differs by +60 bps for women after controls → potential pricing disparity.
### H) Adverse action reasons
- Compare frequency and ordering of reason codes by group at similar risk levels. Unexpected differences may indicate proxy features driving denials.
### I) Feature audit for proxies
- Examine correlation of features (e.g., geography, education, device, employer) with protected classes. Remove or constrain features that act as proxies without business necessity.
- Use monotonicity or fairness constraints where feasible.
### J) Intersectional and geographic analysis
- Analyze intersections (e.g., sex × race × age) to avoid masking effects.
- For geography (mortgage or geo-targeted products): check redlining/digital redlining via tract-level minority share vs. approval/marketing exposure.
### K) Missing labels, selection bias, and reject inference
- Protected labels may be missing; if using imputation (e.g., BISG), quantify error and use only for monitoring.
- Outcomes are observed only for approved loans. Use:
- Reject inference methods (e.g., augmentation, parceling, IPW) with sensitivity analysis.
- Safe, small randomized approvals near decision boundary to get unbiased outcome data (with strict risk guardrails).
### L) Threshold optimization under fairness constraints
- Jointly optimize thresholds by group or global threshold with post-processing to meet fairness targets (e.g., EO parity) while controlling for expected loss and revenue.
- Document trade-offs (utility vs. fairness) and select policy with governance approval.
### M) Monitoring and governance
- Establish ongoing fairness dashboards (DIR, EO difference, calibration, pricing residuals) with control limits and alerting.
- Pre-deployment fairness review; post-deployment audits; model cards and change logs.
---
## 3) Why I am interested in machine learning (template to personalize)
- Impact at scale: ML turns data into decisions that expand access to affordable credit while managing risk—improving outcomes for millions of applicants.
- Scientific problem-solving: It blends causal thinking, probabilistic modeling, and optimization to make principled, testable improvements.
- Responsible innovation: I’m motivated by building models that are accurate, interpretable, and fair—balancing business goals with societal responsibility.
- Example to personalize: "I built a calibrated default model that improved approval rates by 5% at constant loss, then added fairness checks that reduced disparate impact by 40% without material profit loss."
---
## 4) Past failure and what I’d do differently (STAR example)
- Situation: We launched a credit scorecard refresh to boost approvals near the margin.
- Task: Improve approval rate without increasing losses or harming fairness metrics.
- Action: I focused on AUC and expected loss, but under-invested in calibration-by-group and post-approval pricing analysis. Two weeks post-launch, we saw stable loss but a widening APR residual for women (+35 bps after risk controls).
- Result: We paused pricing changes, ran a root-cause analysis, and found a feature interacting with employment tenure that was miscalibrated for a subgroup. We fixed calibration, added a parity constraint to the pricing model, and the residual closed to <5 bps with neutral unit economics.
- What I’d do differently: Include groupwise calibration in the pre-launch checklist, add a pricing-residual parity gate, run canary rollouts with tighter monitoring, and document fairness trade-offs in the model card.
---
## Quick checklist you could bring to an interview
- Compute applicant and qualified-applicant mix by group.
- Stagewise selection/booking rates and DIR (with CIs).
- Equal opportunity checks within risk bands; regression with group term.
- Model performance parity: AUC, calibration by group.
- Risk-adjusted pricing residual tests; adverse action reason parity.
- Proxy-feature audit; intersectional and geo analyses.
- Address label gaps and reject inference; consider boundary RCTs.
- Add fairness constraints/threshold tuning; governance and monitoring.
Conclusion: A 50/50 funding split is not sufficient evidence of fair lending. Use risk-adjusted, stage-specific, and pricing analyses—grounded in calibration and statistical testing—to assess and maintain fairness, with clear governance and monitoring.