This question evaluates a candidate's incident response and operational troubleshooting skills, including observability-driven root-cause analysis, dependency isolation, and mitigation decision-making in a microservices transfer flow.
A microservices-based banking platform begins experiencing intermittent failures and elevated latency for transfer operations starting at a specific time. Assume you have standard observability and deployment tooling (dashboards for metrics, logs, tracing; feature flags; canary/rollback; cloud infrastructure; message queues) and that transfer requests flow through an API gateway to service(s) that interact with a database and at least one external payment partner.
Describe your end-to-end troubleshooting approach:
Login required