Debugging a Churn Prediction Pipeline With Poor Generalization
Context
You are evaluating a binary churn prediction system with:
-
Training ROC AUC: 0.95
-
Validation ROC AUC (random split): 0.62
-
Time-based holdout ROC AUC (most recent month): 0.55
-
Predicted probabilities are overconfident
-
Positive class prevalence ≈ 1:10 (about 9–10% positive)
Assume the goal is to predict whether a customer will churn in the next period using only information available up to an "as-of" cutoff time.
Task
Describe, step-by-step, how you would debug this system. Cover:
-
Data validation and leakage checks (including temporal leakage)
-
Label and feature drift analysis
-
Cross-validation scheme selection
-
Error analysis (by slices, calibration, threshold-dependent confusion matrices)
-
Ablations and feature audits
-
Training issues (regularization, class weighting, resampling)
Propose concrete experiments to isolate root causes, the metrics you would inspect, and recommend fixes plus a plan to verify improvements and prevent regressions (tests, data versioning, monitoring).