This question evaluates a data scientist's competency in root-cause analysis for production metric anomalies, testing diagnostic reasoning across decision logs, model and rule changes, feature pipelines, traffic-mix shifts, experiment assignments, and external dependencies.
You are the on-call Data Scientist supporting the risk/underwriting system. Historically, the daily approval rate has been stable. Yesterday, it dropped sharply in a single day.
Assume you have access to decision logs, rule engine configs, model registry, feature store, experiment platform, observability/monitoring dashboards, and vendor status pages.
Approval rate = approvals / submitted applications, measured on decision-event timestamps. (If your org uses a different definition, state and use it.)
How would you diagnose the root cause? Provide:
Consider:
Login required