PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/OpenAI

Train and analyze a classifier

Last updated: May 14, 2026

Quick Overview

This question evaluates proficiency in end-to-end machine learning engineering, including exploratory data analysis, time-aware train/validation/test splitting, model baseline and improvement, class imbalance strategies, leakage-safe hyperparameter tuning, metric computation and calibration, error analysis, reproducible training pipelines with CLI/config/seed control, explainability (feature importance/SHAP and ablations), and documentation of risks, fairness checks and monitoring hooks. It is commonly asked to assess practical ability to manage the full ML lifecycle and data hygiene in production-like scenarios, testing applied Data Manipulation (SQL/Python) and Machine Learning competencies with an emphasis on practical application while also requiring conceptual understanding of evaluation, calibration and fairness.

  • Medium
  • OpenAI
  • Data Manipulation (SQL/Python)
  • Machine Learning Engineer

Train and analyze a classifier

Company: OpenAI

Role: Machine Learning Engineer

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Given a labeled dataset for binary classification, implement an end-to-end Python solution to train and analyze a classifier. Tasks: ( 1) perform EDA (missingness, outliers, leakage checks, target/feature drift over time), ( 2) create time-aware, stratified train/validation/test splits with proper cross-validation, ( 3) build a strong baseline and at least one improved model, ( 4) handle class imbalance (cost-sensitive loss, resampling, thresholds), ( 5) tune hyperparameters without leakage, ( 6) compute and compare metrics (ROC-AUC, PR-AUC, F1, calibration/Brier, confusion matrix at chosen threshold), ( 7) conduct error analysis by slice and feature, ( 8) produce a reproducible training script with CLI, config, and seed control, ( 9) explain feature importance/SHAP and validate with ablations, and ( 10) document risks, fairness checks, and monitoring hooks for production. Provide code snippets and explain your design choices.

Quick Answer: This question evaluates proficiency in end-to-end machine learning engineering, including exploratory data analysis, time-aware train/validation/test splitting, model baseline and improvement, class imbalance strategies, leakage-safe hyperparameter tuning, metric computation and calibration, error analysis, reproducible training pipelines with CLI/config/seed control, explainability (feature importance/SHAP and ablations), and documentation of risks, fairness checks and monitoring hooks. It is commonly asked to assess practical ability to manage the full ML lifecycle and data hygiene in production-like scenarios, testing applied Data Manipulation (SQL/Python) and Machine Learning competencies with an emphasis on practical application while also requiring conceptual understanding of evaluation, calibration and fairness.

Related Interview Questions

  • Write SQL for repeat churn - OpenAI (hard)
  • Handle repeated churn in SQL - OpenAI (hard)
  • Compute churn with re-subscriptions - OpenAI (hard)
  • Debug and harden trial-assignment Python code - OpenAI (Medium)
  • Write SQL for post-trial conversion cohorts - OpenAI (Medium)
OpenAI logo
OpenAI
Jul 31, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
Data Manipulation (SQL/Python)
13
0

Given a labeled dataset for binary classification, implement an end-to-end Python solution to train and analyze a classifier. Tasks: (

  1. perform EDA (missingness, outliers, leakage checks, target/feature drift over time), (
  2. create time-aware, stratified train/validation/test splits with proper cross-validation, (
  3. build a strong baseline and at least one improved model, (
  4. handle class imbalance (cost-sensitive loss, resampling, thresholds), (
  5. tune hyperparameters without leakage, (
  6. compute and compare metrics (ROC-AUC, PR-AUC, F1, calibration/Brier, confusion matrix at chosen threshold), (
  7. conduct error analysis by slice and feature, (
  8. produce a reproducible training script with CLI, config, and seed control, (
  9. explain feature importance/SHAP and validate with ablations, and (
  10. document risks, fairness checks, and monitoring hooks for production. Provide code snippets and explain your design choices.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI Data Manipulation (SQL/Python)•Machine Learning Engineer Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.