Design a Payment Processing System
Company: OpenAI
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
## Design a Payment Processing System
Design the backend **payment system** for an application that charges users for purchases. A client (web/mobile checkout) sends a payment request; your service charges the user through one or more external **Payment Service Providers** (PSPs — e.g., card-network gateways), records the result, and exposes the payment's status. The system must be correct under retries and concurrency (**no double charges, no lost money**), durable, and fully auditable.
### Constraints & Assumptions
- ~1M payments/day, peaks ~5x average; amounts from $1 to $10,000.
- Integrate at least **two** external PSPs (for routing and failover).
- **Strong correctness:** exactly-once charge semantics from the user's perspective; money must never be created or destroyed in our records.
- **Full auditability:** every state change is recorded immutably.
- **PCI:** never store the raw card number (PAN); tokenize via the PSP.
- **Latency:** the charge API's p95 is a few seconds (bounded by the PSP); status reads must be fast.
- Support refunds (full and partial) and idempotent client retries.
### Clarifying Questions to Ask
- Card-only, or also wallets / bank transfers / ACH? (Assume card via PSP to start.)
- Are we just charging buyers, or also a marketplace that holds funds and pays out to sellers?
- Is the charge synchronous (user waiting at checkout) or can it be async?
- Multi-currency / FX, or single currency?
- What idempotency guarantees do clients provide (do they send a key)?
- What is the regulatory/PCI scope and are there regional constraints?
### Part 1 — API, Data Model, and Idempotency
Design the charge API, the core data model, and the mechanism that guarantees a **client retry never results in a double charge**.
```hint Idempotency
Require a client-supplied idempotency key and persist `(key → outcome)`. A retried request returns the *same* recorded result instead of charging again.
```
```hint Model the lifecycle
Represent a payment as a state machine with an append-only history, not a single mutable `status` column you overwrite. Distinguish the *intent* (what the user wants to pay) from each *attempt* (a specific call to a PSP).
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — PSP Integration, the Ledger, and Consistency
Design how money actually moves through external PSPs and how you record it correctly. In particular: a network call to the PSP can **time out with an unknown outcome** — you don't know whether the charge succeeded. How do you keep your records consistent with the PSP and never double-charge?
```hint The dangerous case
The crux of payments is the *timeout / unknown outcome*. Never blindly re-charge after a timeout — design so the same logical charge is safe to re-issue or to query, end to end.
```
```hint Source of truth for money
Use a double-entry, append-only ledger (every movement has equal debits and credits) as the authoritative record of money — not mutable balance fields.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Webhooks, Reconciliation, Refunds, Failures, Observability
Design asynchronous settlement via PSP **webhooks**, periodic **reconciliation**, **refunds**, and what you monitor and alert on.
```hint Async truth
The PSP's final word often arrives later via webhooks. Make webhook handling idempotent and verify signatures, because webhooks can arrive twice, out of order, or be spoofed.
```
```hint Catch drift
Periodically reconcile your ledger against the PSP's settlement report to detect any divergence money-wise.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- Walk through exactly what happens when the PSP charge call times out — how do you avoid both double-charging and losing the payment?
- How do you implement a partial refund against a captured payment in the double-entry ledger?
- A webhook for the same event arrives twice, and another arrives out of order — how does your handler behave in each case?
- How would you extend this to a marketplace with seller payouts and held funds?
Quick Answer: This system design question evaluates a candidate's ability to architect a backend payment system that stays correct under retries, concurrency, and partial failure. It tests practical application of idempotency, double-entry ledgers, and asynchronous webhook handling, commonly asked to assess distributed-systems reasoning around financial correctness and auditability.