Prevent Duplicate Payments Under High Load
Company: Rippling
Role: Software Engineer
Category: System Design
Interview Round: Onsite
## Prevent Duplicate Payments Under High Load
You are designing the payment-processing service for a company that charges customers' cards (e.g. a payroll or billing platform). Under high load, the same logical payment can be submitted more than once: a client double-clicks "Pay", a request times out and the caller retries, a mobile app reconnects and replays the request, or an internal service re-delivers a queued message.
Design the system so that **the same logical payment is charged at most once**, even when requests are duplicated, services crash mid-flight, the network drops responses, or the system is under heavy concurrent load.
Your design should address:
- How payment requests are made **idempotent** so a retry is recognized as the same operation rather than a new charge.
- How **concurrent duplicate requests** (two in-flight copies of the same payment) are serialized so only one actually charges.
- How **retries after timeouts or partial failures** are handled, especially the dangerous case where the charge *might* have succeeded but the response was lost.
- How you **coordinate with an external payment provider** (e.g. a card processor) that has its own success/failure semantics.
- The **data model, locking strategy, and uniqueness guarantees** that make duplicate prevention the source of truth.
- How you **monitor, detect, and recover** from stuck or ambiguous payments.
```hint Where to start
In distributed systems, lost acknowledgements make "exactly-once execution" unachievable. Reframe the problem: what property *can* you guarantee at the delivery layer, and how does a stable per-payment identifier help each layer downstream recognize a retry?
```
```hint Serializing concurrent duplicates
When two identical requests arrive simultaneously, you need exactly one of them to proceed. Before reaching for a distributed lock, consider what primitive your relational database already provides that can serialize concurrent writes to the same logical entity atomically.
```
```hint The hardest failure
Suppose you sent a charge to the provider, the provider processed it, but the response never arrived — and now a retry is coming. What payment status would represent "we don't know if this charged"? How would you resolve that uncertainty before deciding whether to retry?
```
### Constraints & Assumptions
State (or ask about) the operating envelope; reasonable defaults for this problem:
- High write throughput on the payment path (assume thousands of payment attempts/sec at peak), with bursty retries amplifying load.
- A relational primary store is available and provides ACID transactions and unique constraints.
- The external payment provider exposes an HTTP API and *may or may not* support idempotency keys — design for both.
- Provider calls can time out, return ambiguous errors, or be slow (hundreds of ms to seconds).
- Correctness ranks above latency: it is acceptable to return `202 Accepted` / ask the caller to poll rather than double-charge.
- Money movement must be auditable; every attempt and its final state must be durably recorded.
### Clarifying Questions to Ask
- What is the unit of idempotency — one order, one invoice, one retry of a specific attempt — and who generates the key (client, gateway, or service)?
- Does the external payment provider support idempotency keys, and does it expose a lookup-by-reference endpoint for reconciliation?
- Is the payment path synchronous (caller waits for `SUCCEEDED`) or asynchronous (enqueue and poll)? What latency SLA must we meet?
- What is the expected duplicate/retry rate and peak throughput we must serialize against?
- Are partial captures, multi-currency, or refunds in scope, or only single full-amount charges?
- What is the acceptable behavior when a payment is still in progress and a duplicate arrives — block, poll, or return an in-progress response?
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How long do you retain idempotency keys, and what happens when a caller legitimately wants to "pay again for the same order" — how do you distinguish a retry from an intentional new charge?
- The provider is down or extremely slow during a traffic spike. How do you shed load and preserve correctness without dropping payments or double-charging on recovery?
- Your primary database fails over to a replica with a few seconds of replication lag. How does that affect the uniqueness guarantee, and how do you avoid a duplicate charge across the failover window?
- How would you test this end-to-end — what fault injections (timeouts, duplicate delivery, crashes between DB commit and provider call) would you simulate, and what invariant do you assert?
Quick Answer: This system design question tests a candidate's ability to reason about idempotency, distributed consistency, and fault tolerance in high-throughput payment systems. It evaluates practical knowledge of data modeling, concurrency control, and state machine design under real-world failure scenarios commonly assessed for backend and infrastructure roles.