Donation And Payment Platforms

What's being tested

These interviews test whether you can design a money-moving backend system where correctness matters more than raw feature velocity. DoorDash cares because donations, customer charges, refunds, Dasher payouts, and pay adjustments all require reliable state transitions across internal services, external payment processors, and asynchronous retries. The interviewer is probing for idempotency, transactional integrity, failure recovery, data modeling, API design, and your ability to reason about partial failures without hand-waving. A strong answer treats payments as a state machine backed by an auditable ledger, not as a single charge() function call.

Core knowledge

Payment state machines should be explicit and monotonic: CREATED → AUTHORIZED → CAPTURED → SETTLED, or PLEDGED → PAYMENT_PENDING → PAID → FAILED → REFUNDED. Avoid ambiguous booleans like is_paid; they make retries, reversals, and reconciliation much harder.
Idempotency keys are mandatory for any client- or worker-retryable operation. Store idempotency_key, request hash, response body, status, and expiration. If the same key arrives with a different payload, return 409 Conflict rather than executing a second charge or payout.
Ledger modeling is safer than overwriting balances. Use immutable double-entry rows like account_id, entry_type, amount, currency, debit_credit, transaction_id, created_at. The invariant is $\sum debits = \sum credits$ per transaction, which supports audits and correction entries.
Transactional outbox prevents the classic “database write succeeded but event publish failed” bug. Write business state and an outbox_events row in the same Postgres transaction, then have a relay publish to Kafka, SQS, or another queue with idempotent consumers.
Webhook reconciliation handles external payment processors such as Stripe, Adyen, or Braintree as eventually consistent sources of truth. Validate signatures, persist raw webhook payloads, dedupe by provider event ID, and reconcile processor status against internal state.
Retry semantics need clear boundaries. Retry transient failures like 5xx, network timeouts, and rate limits using exponential backoff with jitter; do not blindly retry validation failures, insufficient funds, expired cards, or processor-declared permanent failures.
Exactly-once payment execution is usually implemented as at-least-once delivery plus idempotent side effects. Queues may redeliver messages, workers may crash after calling a provider, and webhooks may arrive out of order; correctness comes from dedupe tables and state guards.
Payout computation should separate calculation from disbursement. For Dasher pay, compute immutable earning components per delivery, adjustment, bonus, or tip; aggregate into a payout batch; then move money only after the batch is finalized and auditable.
API design should expose stable resource-oriented endpoints: POST /donations, GET /donations/{id}, POST /payout-computations, POST /payouts/{id}/retry. Include Idempotency-Key, ISO-8601 timestamps, minor currency units like cents, and structured errors with retryability flags.
Concurrency control matters for limited-time donation campaigns and batch payouts. Use unique constraints, conditional updates such as WHERE status = 'PENDING', row locks where necessary, and optimistic version fields to prevent double capture, over-allocation, or duplicate batch execution.
Observability should be designed into the workflow. Track payment_success_rate, payment_failure_rate, retry_count, webhook_lag_seconds, stuck_pending_count, duplicate_request_count, and p99 latency. Logs should include payment_id, provider_charge_id, idempotency_key, and correlation_id.
Compliance and security should keep card data out of your system unless absolutely necessary. Use provider tokenization, avoid storing PAN/CVV, encrypt sensitive fields, enforce least-privilege access, and design as though PCI DSS scope reduction is a hard requirement.

Worked example

For “Design an async donation payment platform”, a strong candidate would first clarify scope: are donations one-time or recurring, do we support refunds, what payment processor is assumed, what traffic spike should we handle, and is the donation considered complete when the processor authorizes, captures, or settles funds? Then declare assumptions: use tokenized payment methods, store amounts in minor units, use Postgres for transactional records, and use a queue for asynchronous payment processing.

The answer can be organized around four pillars: data model, request flow, asynchronous worker processing, and reconciliation. The data model should include donations, payment_attempts, ledger_entries, webhook_events, and outbox_events, with unique constraints on idempotency_key and provider IDs. The request flow should return quickly after creating a PENDING donation and enqueueing work, rather than blocking the caller on a processor call that may timeout.

The worker should claim pending attempts, call the payment provider with its own idempotency key, and transition state only if the current state still allows it. Webhooks should be treated as authoritative signals but not blindly trusted: verify the signature, dedupe the event, and reconcile state transitions. One tradeoff to flag is synchronous versus asynchronous confirmation: synchronous gives the user immediate feedback but increases tail latency and timeout ambiguity; asynchronous improves resilience but requires a status endpoint and better UX around pending donations.

A strong close would say: “If I had more time, I’d go deeper on refund flows, backfill/reconciliation jobs, and operational dashboards for stuck payments and webhook lag.”

A second angle

For “Design a resilient dasher payment system”, the same core ideas apply, but the center of gravity shifts from customer charges to earned-balance correctness and payout batching. Instead of donation records, the key entities are deliveries, pay components, adjustments, ledger entries, payout batches, and disbursement attempts. The system must tolerate late corrections, duplicate delivery events, and retries from payout providers without paying a Dasher twice.

The important framing difference is that pay computation should be reproducible and auditable: given the same earning inputs and policy version, the result should be explainable. You would likely emphasize immutable earning events, batch finalization, and double-entry ledgers more than user-facing checkout latency. The same idempotency and reconciliation patterns still apply when calling external payout rails.

Common pitfalls

Pitfall: Treating payment as a single synchronous API call.

A tempting answer is “the API calls Stripe, stores success or failure, and returns.” That misses the hard part: provider timeouts, duplicate requests, delayed webhooks, and partial failures. A better answer models payment attempts, persists intermediate states, and reconciles asynchronously.

Pitfall: Saying “use Kafka” without explaining correctness.

Queues do not solve duplicate execution by themselves. Messages can be delivered more than once, consumers can crash mid-processing, and ordering is not guaranteed globally. The stronger answer is “use at-least-once delivery with idempotent consumers, unique constraints, state-machine guards, and an outbox.”

Pitfall: Ignoring auditability and reversals.

For money systems, updating a balance column directly is usually not enough. Interviewers expect you to preserve history, support refunds or adjustments, and explain how finance or support can answer “what happened to this dollar?” Immutable ledger entries plus correction transactions land much better.

Connections

Interviewers may pivot from this topic into distributed transactions, event-driven architecture, rate limiting, database isolation levels, or observability for critical workflows. They may also ask you to compare Postgres transactions, Kafka-backed event streams, and scheduled batch jobs for different parts of the same payment lifecycle.

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Featured in interview prep guides

Practice questions

Related concepts