You launch an ML model. Offline evaluation (validation/test) looked good, but after deployment the online metrics are significantly worse.
What step-by-step troubleshooting order would you follow to identify and fix the issue? Include what you would check in data, features, serving, evaluation, and feedback loops.