This question evaluates incident-response and operational debugging skills, focusing on diagnosing high memory usage in a distributed microservice architecture that includes caching and third-party integrations.
You are the on-call engineer for a delivery platform.
Dasher Service
), which then calls a
Payment Card Integration Service
.
High-level flow:
Courier App -> Dasher Service -> Payment Card Integration Service -> Third-Party Card Provider
Payment Card Integration Service <-> Redis cache
It is 4:30 PM Pacific, during a busy period, and you are paged because the Payment Card Integration Service is showing much higher than expected memory utilization.
Explain how you would handle this on-call investigation. Your answer should cover: