This question evaluates a candidate's ability to diagnose production machine learning failures, covering competencies in data quality, data and concept drift detection, model versioning and deployment checks, and operational debugging within an MLOps context.
You are on-call for a production machine learning service. Monitoring alerts show that model accuracy, which had been stable, suddenly dropped after a deployment. Labels may arrive with a delay, and traffic patterns can shift over time. You need to systematically diagnose and fix the issue.
Describe a step-by-step process to debug this accuracy drop, including:
Be specific about:
Login required