Design a Payment System
Company: OpenAI
Role: Software Engineer
Category: System Design
Difficulty: easy
Interview Round: Technical Screen
## Design a Payment System
Design the backend payment system for an online platform that charges customers for goods and services (for example, a marketplace checkout or a usage-based SaaS/API billing product). Customers pay with credit and debit cards, and the actual money movement is performed by external Payment Service Providers (PSPs) such as Stripe or Adyen rather than by your system directly.
Your system is responsible for orchestrating a payment from "customer clicks Pay" through to a confirmed, recorded transaction: calling the PSP, recording an authoritative internal record of every charge, guaranteeing that a customer is never double-charged on retries, supporting refunds, and reconciling your records against what the PSP reports actually happened. Walk through the end-to-end design: the high-level architecture, the core data model, the read and write paths, how you handle failures and retries, and how the system scales.
```hint Where to start
Start from the money-movement flow and the single most dangerous failure mode: a network timeout where you do not know whether the PSP charged the card. Make the *write path idempotent* with a client-supplied idempotency key so a retried request maps to the same charge instead of creating a second one.
```
```hint Recording money
Treat the payment record as an append-only state machine with a small set of explicit states (something like: created, pending, a terminal success/failure, and refunded) plus a **double-entry ledger**. Never mutate a charge's amount in place; never trust a single boolean `paid` flag as the source of truth.
```
```hint PSPs lie about timing
The PSP is the source of truth for whether money moved, but it tells you asynchronously. Plan for **two** signals: the synchronous API response *and* an out-of-band webhook, and design **reconciliation** for when they disagree or one never arrives.
```
### Constraints & Assumptions
State your own numbers, but a reasonable scoping is:
- ~10M customers; peak ~1,000 payment requests/second (with bursty spikes, e.g., a sale or billing run), averaging far lower.
- Money is moved by one or more external PSPs over HTTPS; PSP calls have p99 latency in the hundreds of milliseconds and can time out.
- Correctness dominates latency: a payment may take a second or two, but it must **never** double-charge and must **never** lose a successful charge.
- Multiple currencies; refunds (full and partial) are required; chargebacks/disputes exist but can be handled out of band.
- The system must produce an auditable financial record and be able to reconcile with PSP settlement reports.
- Out of scope (call this out): card-number storage and PCI scope (delegated to the PSP via tokenization), fraud scoring, and tax computation.
### Clarifying Questions to Ask
- What are we processing — one-time checkouts, recurring subscriptions, usage-based metered billing, or marketplace payouts to third parties? Each changes the data model and money-movement direction.
- Do we integrate a single PSP or must we route across several (for redundancy, cost, or geographic coverage)?
- What is the consistency/latency expectation at checkout — must the user see a final "paid" result synchronously, or is an async "we'll confirm shortly" acceptable?
- Who owns card data and PCI compliance? Are we tokenizing through the PSP so raw PANs never touch our servers?
- What are the refund and dispute requirements, and do we need a customer-facing or finance-facing ledger/reporting view?
- What regulatory/audit constraints apply (immutability of records, data residency, retention)?
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- A PSP call times out and you never get a webhook. Walk through exactly how your system converges to the correct state and how long that takes. What does the customer see in the meantime?
- How do you guarantee exactly-once *effect* on the customer's card given that the network gives you at-least-once delivery on both your retries and the PSP's webhooks?
- How would you extend the design to route across multiple PSPs (failover when one is down, or cost-based routing) without breaking idempotency or the ledger?
- How do you support recurring/subscription billing and dunning (retrying failed renewals) on top of this core?
- How do you handle a chargeback/dispute weeks after the original payment, and how is that reflected in the ledger?
Quick Answer: This system design question evaluates a candidate's ability to architect a reliable backend payment flow, including idempotent request handling, transactional data modeling, and reconciliation with external processors. It is commonly asked to assess practical distributed systems skills around failure handling, consistency, and scalability under real-world conditions, testing applied architectural reasoning rather than pure theory.