This question evaluates the ability to diagnose discrepancies between offline and online model performance by reasoning about data distributions, feature computation, serving infrastructure, evaluation methodology, and feedback loops, testing skills in debugging model deployment and operational analytics.
You launch an ML model. Offline evaluation (validation/test) looked good, but after deployment the online metrics are significantly worse.
What step-by-step troubleshooting order would you follow to identify and fix the issue? Include what you would check in data, features, serving, evaluation, and feedback loops.