PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/OpenAI

Build and troubleshoot image classification and backprop

Last updated: Jun 3, 2026

Quick Overview

Build and troubleshoot image classification and backprop evaluates core ML concepts, assumptions, math intuition, training/evaluation trade-offs, and practical failure modes in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

  • hard
  • OpenAI
  • Machine Learning
  • Machine Learning Engineer

Build and troubleshoot image classification and backprop

Company: OpenAI

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You are given a CIFAR-like dataset of 32×32 color images across 10–20 classes with suspected data issues (label noise ~8–15%, corrupted images, and class imbalance). 1) Build a baseline classifier and a data-quality improvement plan: describe how you will detect and quantify label noise, identify and filter corrupted or low-quality samples, manage class imbalance, create robust train/validation/test splits, and prevent leakage. Compare mitigation strategies (e.g., confidence-based pruning, co-teaching, strong augmentations such as MixUp/CutMix), and show how each step changes metrics (top-1 accuracy, calibration, confusion matrix). 2) Implement core learning mechanics from first principles: using only NumPy-like linear algebra, write forward and backward passes for a two-layer network (linear → ReLU → linear → softmax cross-entropy), compute analytical gradients, and validate them with numerical gradient checks. Discuss numerical stability (log-sum-exp), initialization, regularization, and how you would extend to CNNs for this dataset.

Quick Answer: Build and troubleshoot image classification and backprop evaluates core ML concepts, assumptions, math intuition, training/evaluation trade-offs, and practical failure modes in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

Related Interview Questions

  • Implement 1NN with NumPy - OpenAI (medium)
  • Compute entropy and implement 1-NN - OpenAI (medium)
  • Defend a Research Direction and Experiment Design - OpenAI (medium)
  • Implement Backprop for a Tiny Network - OpenAI (hard)
  • Debug MiniGPT and Backpropagate Matmul - OpenAI (medium)
|Home/Machine Learning/OpenAI

Build and troubleshoot image classification and backprop

OpenAI logo
OpenAI
Jul 27, 2025, 12:00 AM
hardMachine Learning EngineerTechnical ScreenMachine Learning
64
0

Build and troubleshoot image classification and backprop

CIFAR-like Noisy Dataset: Baseline, Data Quality Plan, and First-Principles Backprop

Context: You have a CIFAR-like dataset of 32×32 RGB images, 10–20 classes. You suspect 8–15% label noise, some corrupted images, and class imbalance. You must deliver both a data-quality plan and a minimal from-scratch learning core.

Part 1 — Baseline classifier and data-quality improvement plan

Build a baseline classifier and a practical plan to improve data quality and robustness. Clearly describe:

  1. How you will:
    • Detect and quantify label noise.
    • Identify and filter corrupted or low-quality samples.
    • Manage class imbalance.
    • Create robust train/validation/test splits and prevent leakage.
  2. Compare mitigation strategies and when to use them:
    • Confidence-based pruning/reweighting.
    • Co-teaching.
    • Strong augmentations (e.g., MixUp, CutMix; optionally RandAugment).
  3. Show how each step changes metrics:
    • Top-1 accuracy.
    • Calibration (e.g., Expected Calibration Error, ECE; Brier score).
    • Confusion matrix observations.

Assume you can train a small CNN/ResNet for the baseline. Keep a held-out test set untouched.

Part 2 — Core learning mechanics from first principles (NumPy-only)

Implement forward and backward passes for a two-layer neural network (Linear → ReLU → Linear → Softmax Cross-Entropy) using only NumPy-like linear algebra:

  1. Write vectorized forward and backward computations, including analytical gradients for all parameters.
  2. Validate gradients via numerical finite-difference checks.
  3. Discuss and implement numerical stability (e.g., log-sum-exp for softmax), sensible initialization, and regularization.
  4. Briefly describe how you would extend the architecture to CNNs suitable for this dataset.

Constraints & Assumptions

  • Preserve the scope, facts, inputs, and requested outputs from the prompt above.
  • If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
  • Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

  • Clarify the task, data shape, labels, constraints, and evaluation metric.
  • State assumptions behind the math or modeling technique you choose.
  • Connect theory to practical training, debugging, and deployment implications.

What a Strong Answer Covers

  • Correct definitions and formulas where the prompt requires them.
  • A practical explanation of how the method behaves on real data.
  • Trade-offs, failure modes, diagnostics, and mitigation strategies.
  • Evaluation choices that match the product or modeling objective.

Follow-up Questions

  • How would noisy labels, class imbalance, or distribution shift affect the answer?
  • What would you monitor after deployment?
  • Which baseline would you compare against first?
Loading comments...

Browse More Questions

More Machine Learning•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI Machine Learning•Machine Learning Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.