Payment Processing And Ledger Systems

What's being tested

A strong answer shows you can design a financially correct distributed system, not just a high-throughput CRUD API. Interviewers are probing whether you understand idempotency, ledger modeling, transaction boundaries, failure recovery, webhook delivery, security, and operational observability under real money-movement constraints. OpenAI cares because billing, credits, subscriptions, usage metering, refunds, and enterprise invoicing all require correctness under retries, partial failures, and third-party processor behavior. The best candidates separate the payment orchestration layer from the source-of-truth ledger, and they explicitly reason about what happens when every network call can timeout after succeeding.

Core knowledge

Payment lifecycle usually includes authorize, capture, void, refund, dispute, and settlement. Do not model “payment succeeded” as a single boolean; use a state machine with terminal states, valid transitions, timestamps, and external processor references.
Idempotency keys are mandatory for client-facing mutation APIs like POST /payments. Store (merchant_id, idempotency_key) -> request_hash, response_body, status in a durable database. If the same key arrives with a different request hash, return 409 Conflict rather than creating a second charge.
Exactly-once semantics are usually implemented as at-least-once delivery plus idempotent consumers. Distributed systems rarely guarantee true exactly-once across your database, queue, and a card network; use dedupe tables, unique constraints, and transactional writes to make repeated messages harmless.
Double-entry ledger is the safest money model: every transaction writes balanced debits and credits, so $\sum debits = \sum credits$ . Example accounts: customer_cash, merchant_payable, processor_clearing, fees_revenue, refunds_payable. Never update balances without immutable journal entries.
Ledger immutability matters more than convenience. Do not edit a historical row to “fix” money; append a reversing entry or adjustment entry with a reason code. Derived balances can be cached, but the journal is the auditable source of truth.
Database transactions should protect local invariants. In Postgres, write the payment record, idempotency record, ledger entries, and outbox event in one SERIALIZABLE or carefully constrained READ COMMITTED transaction. Use unique indexes for external IDs and idempotency keys.
Transactional outbox avoids the classic “DB commit succeeded but message publish failed” bug. Write outbox_events inside the same transaction, then have a relay publish to Kafka, SQS, or Pub/Sub. Consumers dedupe using event_id and process idempotently.
Processor integration needs explicit handling for timeouts and ambiguous outcomes. If Stripe, Adyen, or another PSP times out, you cannot assume failure; query by idempotency key or external reference before retrying. The dangerous case is “server timed out, card was charged.”
State machines should encode allowed transitions: created -> authorized -> captured -> settled, authorized -> voided, captured -> refunded, captured -> disputed. Guard transitions with compare-and-swap updates like WHERE status = 'authorized' to prevent racing captures or refunds.
Reconciliation is a first-class subsystem. Compare your internal ledger against processor settlement files, bank deposits, and webhook events. Track mismatches by payment_id, processor_charge_id, amount, currency, and effective date; auto-resolve known timing gaps, escalate true breaks.
Currency handling must avoid floating point. Store amounts as integer minor units plus ISO currency, for example amount_minor=1099, currency='USD'. Be careful with zero-decimal currencies like JPY, currency-specific minimums, and rounding for tax, fees, and FX.
Security and compliance engineering means minimizing sensitive data. Use PSP tokenization rather than storing PANs, isolate PCI-scoped services, encrypt secrets with KMS, redact logs, enforce least-privilege access, and support retention/deletion workflows for personal data where applicable.

Tip: A concise money-system answer usually has five pillars: API/idempotency, orchestration/state machine, immutable ledger, async events/reconciliation, and security/operations.

Worked example

For Design a scalable payment processor, start by clarifying scope: “Are we building a PSP like Stripe, or a merchant-side payment service integrating with an external PSP? What payment methods, target QPS, currencies, and requirements for refunds, disputes, and settlement?” Then declare assumptions: card payments only, external card network/PSP integration, 1k writes/sec to start, strict correctness over latency, and APIs for create payment, capture, refund, and get status.

Organize the answer around 4 pillars. First, define the external API with idempotency keys and stable resource IDs: POST /payments, POST /payments/{id}/capture, POST /payments/{id}/refunds. Second, describe the internal state machine and persistence model: payments, payment_attempts, ledger_entries, idempotency_keys, and outbox_events. Third, explain failure handling: external PSP calls can timeout, webhooks may arrive late or duplicated, and every handler must be idempotent. Fourth, cover operational controls: reconciliation jobs, audit logs, metrics like payment_success_rate, processor_timeout_rate, reconciliation_break_count, and alerts on stuck states.

A specific tradeoff to flag is whether to call the PSP inside the same request path or enqueue asynchronous processing. Synchronous processing gives simpler client semantics and lower perceived latency for small scale, but it ties availability to the PSP and makes timeouts user-visible. An asynchronous design returns payment_id quickly and lets clients poll or receive webhooks, but requires more status handling and UX work. A strong answer might choose synchronous authorization with a durable attempt record plus asynchronous reconciliation and webhook processing.

Close by saying: “If I had more time, I’d detail multi-region failover, PCI scope isolation, dispute handling, and ledger reconciliation against settlement files.”

A second angle

For Design webhook, POI, chat, CI/CD, payments, the payment portion often appears as one subsystem in a broader design menu, so you need to be crisp rather than exhaustive. The transferable concept is reliable event delivery: payment status changes should emit durable events, and outgoing webhooks to merchants need retries, exponential backoff, signing, and deduplication. Unlike the full payment processor design, the interviewer may focus less on card authorization internals and more on how consumers learn that payment.succeeded or refund.created happened.

A good framing is: “For payments, webhooks are a notification layer, not the source of truth; clients should be able to call GET /payments/{id} to verify status.” Then describe webhook_endpoints, webhook_deliveries, event_id, HMAC signatures, retry schedules, and dead-letter handling. The same idempotency principle applies on both sides: you dedupe inbound processor webhooks and provide stable IDs so downstream merchants can dedupe your outbound webhooks.

Common pitfalls

Pitfall: Treating the payment table as the ledger.

A tempting but weak design stores payments(id, user_id, amount, status) and updates status='refunded' after a refund. That loses the audit trail and cannot represent partial refunds, fees, disputes, settlement timing, or accounting balances. A stronger answer keeps operational payment state separate from immutable double-entry journal entries.

Pitfall: Claiming “exactly once” without explaining the mechanism.

Saying “we use Kafka exactly-once” is not enough, because external PSP calls, database commits, and webhook delivery still create ambiguous outcomes. Interviewers want to hear concrete safeguards: idempotency keys, unique constraints, transactional outbox, dedupe tables, retry-safe state transitions, and reconciliation.

Pitfall: Over-indexing on scale while under-specifying correctness.

A candidate may jump to sharding, Cassandra, and millions of QPS before defining money invariants. For payment systems, correctness dominates: no double charges, no lost refunds, balanced ledger entries, durable audit logs, and recoverability after partial failure. Discuss scale after you have established the invariants and transaction boundaries.

Connections

Interviewers may pivot from this topic into webhook infrastructure, event-driven architecture, database isolation levels, distributed transactions, or usage-based billing and metering. They may also ask about multi-region availability, where the key tension is between low-latency failover and preserving single-writer financial invariants.

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Practice questions

Related concepts