PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/OpenAI

Debug a failing ML classifier

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in diagnosing and debugging production machine learning pipelines, covering data validation, temporal leakage detection, label and feature drift analysis, calibration, and training regularization considerations.

  • hard
  • OpenAI
  • Machine Learning
  • Software Engineer

Debug a failing ML classifier

Company: OpenAI

Role: Software Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You are handed a binary classification pipeline for churn prediction. Training AUC is 0.95, validation AUC on a random split is 0.62, but AUC on a recent time-based holdout (most recent month) is 0.55. Predicted probabilities are overconfident, and the positive class prevalence is 1:10. Describe, step by step, how you would debug this system. Cover: data validation and leakage checks (including temporal leakage), label/feature drift analysis, cross-validation scheme selection, error analysis (by slices, calibration, threshold-dependent confusion matrices), ablations and feature audits, and training issues (regularization, class weighting, resampling). Propose concrete experiments to isolate root causes, the metrics you would inspect, and recommend fixes plus a plan to verify improvements and prevent regressions (tests, data versioning, monitoring).

Quick Answer: This question evaluates a candidate's competency in diagnosing and debugging production machine learning pipelines, covering data validation, temporal leakage detection, label and feature drift analysis, calibration, and training regularization considerations.

Related Interview Questions

  • Implement 1NN with NumPy - OpenAI (medium)
  • Compute entropy and implement 1-NN - OpenAI (medium)
  • Defend a Research Direction and Experiment Design - OpenAI (medium)
  • Debug MiniGPT and Backpropagate Matmul - OpenAI (medium)
  • Filter Bad Human Annotations - OpenAI (medium)
OpenAI logo
OpenAI
Jul 28, 2025, 12:00 AM
Software Engineer
Technical Screen
Machine Learning
41
0

Debugging a Churn Prediction Pipeline With Poor Generalization

Context

You are evaluating a binary churn prediction system with:

  • Training ROC AUC: 0.95
  • Validation ROC AUC (random split): 0.62
  • Time-based holdout ROC AUC (most recent month): 0.55
  • Predicted probabilities are overconfident
  • Positive class prevalence ≈ 1:10 (about 9–10% positive)

Assume the goal is to predict whether a customer will churn in the next period using only information available up to an "as-of" cutoff time.

Task

Describe, step-by-step, how you would debug this system. Cover:

  1. Data validation and leakage checks (including temporal leakage)
  2. Label and feature drift analysis
  3. Cross-validation scheme selection
  4. Error analysis (by slices, calibration, threshold-dependent confusion matrices)
  5. Ablations and feature audits
  6. Training issues (regularization, class weighting, resampling)

Propose concrete experiments to isolate root causes, the metrics you would inspect, and recommend fixes plus a plan to verify improvements and prevent regressions (tests, data versioning, monitoring).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More OpenAI•More Software Engineer•OpenAI Software Engineer•OpenAI Machine Learning•Software Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.