System Design: Asynchronous API Call With Idempotency
Design an HTTP API that triggers a long-running operation (seconds to minutes) against downstream systems (e.g., charge a payment, generate a report, provision resources). The API must be asynchronous and must guarantee idempotency so that client retries do not duplicate work.
Requirements
-
Client calls an endpoint to start a job.
-
The request may be retried due to timeouts, network errors, or 5xx responses.
-
The system must not execute the operation more than once for the same logical request.
-
Provide a way for clients to check job status and fetch the result.
Considerations
-
High QPS, multiple instances, at-least-once delivery in messaging.
-
Partial failures (DB commit succeeds but enqueue fails, or vice versa).
-
Exactly-once semantics are not assumed.
Describe:
-
API endpoints and request/response shapes
-
Data model
-
Queue/worker flow
-
How idempotency is enforced (including edge cases and TTL)
-
Observability and operational concerns