This question evaluates a candidate's competency in ML system design, specifically out-of-distribution detection, production monitoring, interpretability, and feedback loops for maintaining model reliability and data management.
You are building a product that uses an ML classifier in production (e.g., for routing, ranking, safety, fraud, or categorization). Over time, the live input distribution may shift and users may submit inputs that are out-of-distribution (OOD) relative to the model’s training data.
Design an end-to-end system to identify OOD data in production and support actions such as alerting, safe fallback behavior, and data collection for retraining.
Assume you can log inputs/embeddings/predictions and you have a standard feature store + model serving stack.
Login required