Debug a failing ML classifier
Company: OpenAI
Role: Software Engineer
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
You are handed a binary classification pipeline for churn prediction. Training AUC is 0.95, validation AUC on a random split is 0.62, but AUC on a recent time-based holdout (most recent month) is 0.55. Predicted probabilities are overconfident, and the positive class prevalence is 1:10. Describe, step by step, how you would debug this system. Cover: data validation and leakage checks (including temporal leakage), label/feature drift analysis, cross-validation scheme selection, error analysis (by slices, calibration, threshold-dependent confusion matrices), ablations and feature audits, and training issues (regularization, class weighting, resampling). Propose concrete experiments to isolate root causes, the metrics you would inspect, and recommend fixes plus a plan to verify improvements and prevent regressions (tests, data versioning, monitoring).
Quick Answer: This question evaluates a candidate's competency in diagnosing and debugging production machine learning pipelines, covering data validation, temporal leakage detection, label and feature drift analysis, calibration, and training regularization considerations.