Design an OOD detection system

Q: Design an OOD detection system

This is a ML System Design interview question from OpenAI for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Loading...

Prompt

You are building a product that uses an ML classifier in production (e.g., for routing, ranking, safety, fraud, or categorization). Over time, the live input distribution may shift and users may submit inputs that are out-of-distribution (OOD) relative to the model’s training data.

Design an end-to-end system to identify OOD data in production and support actions such as alerting, safe fallback behavior, and data collection for retraining.

Requirements

Detect OOD inputs in (near) real time and/or via batch monitoring.
Minimize false alarms while still catching meaningful distribution shift.
Provide interpretable signals to on-call/ML engineers (what changed, where, and how severe).
Support a feedback loop: triage → labeling (if needed) → retraining/evaluation.

What to cover

Define what “OOD” means for this product (vs. mislabeled, rare-but-in-distribution, adversarial, or novel classes).
Propose modeling/algorithmic approaches for OOD detection.
Specify offline evaluation and online metrics.
Design the data/serving/monitoring architecture.
Decide what happens when an input is flagged OOD (fallbacks, user experience, logging).
Handle edge cases: class imbalance, seasonality, new features, model updates, and cold start.

Assume you can log inputs/embeddings/predictions and you have a standard feature store + model serving stack.

Design an OOD detection system

Prompt

Requirements

What to cover

Solution

Comments (0)