This question evaluates a candidate's competence in designing fault-tolerant payment workflows and operational resilience, covering concepts such as idempotency, retry strategies, queuing and dead-letter handling, circuit breaking, reconciliation, and monitoring.
You own a service that computes courier payouts and then submits payout requests to a downstream payment processor. What should your system do if the payment service is temporarily unavailable, timing out, or returning intermittent errors?
Discuss how to preserve correctness and a good user experience, including: