Payment Systems: Ledgers, Idempotency, and Reconciliation
Asked of: Software Engineer
Last updated
What's being tested
Interviewers are probing whether you can design money-moving systems that remain correct under retries, partial failures, duplicate messages, delayed callbacks, and inconsistent external providers. For a Software Engineer, the focus is not finance strategy; it is distributed-systems correctness, data modeling, API semantics, and operational recovery. Google cares because payments-like systems appear anywhere durable state must be exact: billing, ads spend, credits, quotas, refunds, subscriptions, and marketplace payouts. Strong answers separate the source of truth from derived views, define clear state transitions, and explain how to recover when systems disagree.
Core knowledge
-
Double-entry ledger design records every movement as balanced debit and credit entries, not as mutable account balances. A core invariant is: for each transaction batch. Balances should be materialized views over immutable journal entries, not the primary source of truth.
-
Ledger immutability is the foundation of auditability. Instead of updating “payment amount from 8,” append a correcting entry or reversal. This makes reconciliation, dispute handling, and historical debugging possible even months later.
-
Idempotency keys prevent duplicate side effects when clients retry after timeouts. A common pattern, used by systems like
Stripe, is storing(merchant_id, idempotency_key) -> request_hash, response, status. Same key plus same payload returns the original response; same key plus different payload should fail with409 Conflict. -
Exactly-once effects are usually implemented with at-least-once delivery plus idempotent writes, not magical exactly-once networking. For example,
POST /paymentsmay be retried, webhooks may be delivered multiple times, and queue consumers may crash after committing toPostgresbut before acknowledging aKafkamessage. -
Payment state machines should be explicit and monotonic where possible:
CREATED -> AUTHORIZED -> CAPTURED -> SETTLED, orCREATED -> FAILED, with separate refund states. Avoid ambiguous booleans likepaid=true; model terminal states, retryable states, and external provider references. -
Outbox pattern helps coordinate database writes with asynchronous publishing. Write the ledger entry and an
outbox_eventsrow in onePostgrestransaction, then a relay publishes toPub/SuborKafka. This avoids the classic failure where the DB commit succeeds but the event publish fails. -
Reconciliation compares internal records against external truth sources such as processor settlement files, bank reports, or provider APIs. Match on stable identifiers like
provider_charge_id, amount, currency, merchant, and settlement date. Expect timing gaps, partial captures, fees, chargebacks, and currency rounding differences. -
Balance computation should separate correctness from performance. The canonical balance is
SUM(entries.amount)over immutable entries; for scale, maintain cached balances updated transactionally. For accounts with millions of entries, snapshot periodically and computesnapshot_balance + SUM(entries after snapshot). -
Concurrency control matters when multiple requests affect the same account. Use database transactions, row-level locks, optimistic version checks, or compare-and-swap semantics. For example, decrementing prepaid credits should be atomic:
UPDATE accounts SET balance = balance - x WHERE id = ? AND balance >= x. -
Currency handling must avoid floating point. Store amounts as integers in the smallest currency unit, such as cents, or as fixed-precision decimals for currencies with nonstandard minor units. Include
currencyon every amount; never addUSDandEURbalances without an explicit FX event. -
Failure modes should be first-class design cases: client timeout after success, provider accepts payment but callback is delayed, duplicate webhook arrives, internal DB commits but worker crashes, settlement file has an unknown transaction, or refund succeeds externally but internal update fails.
-
Observability and audit trails should expose transaction IDs, idempotency keys, provider IDs, ledger batch IDs, and state transitions. Useful metrics include duplicate request rate, reconciliation mismatch count, unreconciled amount, pending settlement age, and payment state transition latency
p99.
Worked example
Design a payment ledger for a checkout system
A strong candidate would first clarify scope: “Are we processing cards directly or integrating with a provider? Do we need auth/capture/refund, multiple currencies, and merchant accounts? What is the required source of truth for balances?” Then they would declare assumptions: use an external processor for card movement, keep an internal immutable ledger in Postgres, and treat provider callbacks as asynchronous and potentially duplicated.
The answer should be organized around four pillars: API flow, ledger model, idempotency, and reconciliation. For the API flow, POST /payments accepts an idempotency key, creates a payment record, calls the provider, and records state transitions. For the ledger model, every captured payment creates balanced entries, for example debit buyer_cash_pending and credit merchant_receivable, then later move from pending to settled. For idempotency, the system stores the original request hash and response, so a retry after timeout cannot create a second charge.
A key tradeoff to flag is synchronous versus asynchronous confirmation. Returning only after provider capture gives a simpler client experience but higher latency and timeout risk; returning PENDING with later webhook completion improves resilience but requires clients to handle eventual consistency. A strong close would add: “If I had more time, I would detail refund and chargeback reversal entries, reconciliation jobs against settlement files, and operational dashboards for stuck payments.”
A second angle
Make a payment API safe under retries
Here the framing shifts from full system design to API semantics and write-path correctness. The core idea is still preserving one financial effect per logical request, but the candidate should go deeper on the idempotency table, unique constraints, and transaction boundaries. A good design uses (caller_id, idempotency_key) as a unique key, stores request fingerprint and final response, and makes the payment creation and idempotency record update atomic. The edge case to discuss is concurrent duplicate requests: two identical retries may arrive simultaneously, so the system needs INSERT ... ON CONFLICT, row locks, or a PROCESSING state with safe polling behavior. The same ledger principle applies: even if the API response is retried, the ledger entries must be created exactly once.
Common pitfalls
Pitfall: Treating account balance as the source of truth.
A tempting answer is “store balance on the user row and update it after each payment.” That can work as a cache, but it is insufficient for payments because you cannot audit why the balance changed. A stronger answer says balances are derived from immutable ledger entries, with cached balances updated transactionally for performance.
Pitfall: Saying “use exactly-once messaging” without explaining the write path.
Interviewers will push on what happens when a worker writes to the database and crashes before acknowledging the message. A better answer acknowledges at-least-once delivery, then uses idempotent consumers, unique constraints, outbox events, and replay-safe handlers.
Pitfall: Ignoring external reconciliation.
Many candidates stop after “payment succeeded in our DB.” Real systems must compare internal state against processor settlement records because callbacks can be delayed, dropped, duplicated, or semantically different from final settlement. A strong candidate explicitly designs mismatch detection and manual or automated repair paths.
Connections
Interviewers may pivot from here into distributed transactions, saga orchestration, database isolation levels, event-driven architecture, or API design for retries. They may also ask about scaling hot accounts, multi-region consistency, refund workflows, or how to debug a production incident where customers were charged twice.
Further reading
-
Stripe API Idempotent Requests — practical reference for idempotency-key behavior, request replay, and duplicate prevention.
-
Martin Kleppmann, Designing Data-Intensive Applications — strong background on transactions, logs, replication, and exactly-once misconceptions.
-
Pat Helland, “Life Beyond Distributed Transactions” — useful framing for sagas, uncertainty, and building reliable workflows without global transactions.
Related concepts
- Payment Processing And Ledger SystemsSystem Design
- Wallets, Payments, And Refund LedgersSystem Design
- Banking Ledgers And Cashback OperationsSystem Design
- Distributed System Design For Ledgers And CountersSystem Design
- Donation And Payment PlatformsSystem Design
- Distributed Systems Consistency, Reliability, And ObservabilitySystem Design