Donation And Payment Platforms
Asked of: Software Engineer
Last updated

What's being tested
These interviews test whether you can design a money-moving backend system where correctness matters more than raw feature velocity. DoorDash cares because donations, customer charges, refunds, Dasher payouts, and pay adjustments all require reliable state transitions across internal services, external payment processors, and asynchronous retries. The interviewer is probing for idempotency, transactional integrity, failure recovery, data modeling, API design, and your ability to reason about partial failures without hand-waving. A strong answer treats payments as a state machine backed by an auditable ledger, not as a single charge() function call.
Core knowledge
-
Payment state machines should be explicit and monotonic:
CREATED → AUTHORIZED → CAPTURED → SETTLED, orPLEDGED → PAYMENT_PENDING → PAID → FAILED → REFUNDED. Avoid ambiguous booleans likeis_paid; they make retries, reversals, and reconciliation much harder. -
Idempotency keys are mandatory for any client- or worker-retryable operation. Store
idempotency_key, request hash, response body, status, and expiration. If the same key arrives with a different payload, return409 Conflictrather than executing a second charge or payout. -
Ledger modeling is safer than overwriting balances. Use immutable double-entry rows like
account_id,entry_type,amount,currency,debit_credit,transaction_id,created_at. The invariant is per transaction, which supports audits and correction entries. -
Transactional outbox prevents the classic “database write succeeded but event publish failed” bug. Write business state and an
outbox_eventsrow in the samePostgrestransaction, then have a relay publish toKafka,SQS, or another queue with idempotent consumers. -
Webhook reconciliation handles external payment processors such as
Stripe,Adyen, orBraintreeas eventually consistent sources of truth. Validate signatures, persist raw webhook payloads, dedupe by provider event ID, and reconcile processor status against internal state. -
Retry semantics need clear boundaries. Retry transient failures like
5xx, network timeouts, and rate limits using exponential backoff with jitter; do not blindly retry validation failures, insufficient funds, expired cards, or processor-declared permanent failures. -
Exactly-once payment execution is usually implemented as at-least-once delivery plus idempotent side effects. Queues may redeliver messages, workers may crash after calling a provider, and webhooks may arrive out of order; correctness comes from dedupe tables and state guards.
-
Payout computation should separate calculation from disbursement. For Dasher pay, compute immutable earning components per delivery, adjustment, bonus, or tip; aggregate into a payout batch; then move money only after the batch is finalized and auditable.
-
API design should expose stable resource-oriented endpoints:
POST /donations,GET /donations/{id},POST /payout-computations,POST /payouts/{id}/retry. IncludeIdempotency-Key, ISO-8601 timestamps, minor currency units like cents, and structured errors with retryability flags. -
Concurrency control matters for limited-time donation campaigns and batch payouts. Use unique constraints, conditional updates such as
WHERE status = 'PENDING', row locks where necessary, and optimistic version fields to prevent double capture, over-allocation, or duplicate batch execution. -
Observability should be designed into the workflow. Track
payment_success_rate,payment_failure_rate,retry_count,webhook_lag_seconds,stuck_pending_count,duplicate_request_count, andp99latency. Logs should includepayment_id,provider_charge_id,idempotency_key, andcorrelation_id. -
Compliance and security should keep card data out of your system unless absolutely necessary. Use provider tokenization, avoid storing PAN/CVV, encrypt sensitive fields, enforce least-privilege access, and design as though
PCI DSSscope reduction is a hard requirement.
Worked example
For “Design an async donation payment platform”, a strong candidate would first clarify scope: are donations one-time or recurring, do we support refunds, what payment processor is assumed, what traffic spike should we handle, and is the donation considered complete when the processor authorizes, captures, or settles funds? Then declare assumptions: use tokenized payment methods, store amounts in minor units, use Postgres for transactional records, and use a queue for asynchronous payment processing.
The answer can be organized around four pillars: data model, request flow, asynchronous worker processing, and reconciliation. The data model should include donations, payment_attempts, ledger_entries, webhook_events, and outbox_events, with unique constraints on idempotency_key and provider IDs. The request flow should return quickly after creating a PENDING donation and enqueueing work, rather than blocking the caller on a processor call that may timeout.
The worker should claim pending attempts, call the payment provider with its own idempotency key, and transition state only if the current state still allows it. Webhooks should be treated as authoritative signals but not blindly trusted: verify the signature, dedupe the event, and reconcile state transitions. One tradeoff to flag is synchronous versus asynchronous confirmation: synchronous gives the user immediate feedback but increases tail latency and timeout ambiguity; asynchronous improves resilience but requires a status endpoint and better UX around pending donations.
A strong close would say: “If I had more time, I’d go deeper on refund flows, backfill/reconciliation jobs, and operational dashboards for stuck payments and webhook lag.”
A second angle
For “Design a resilient dasher payment system”, the same core ideas apply, but the center of gravity shifts from customer charges to earned-balance correctness and payout batching. Instead of donation records, the key entities are deliveries, pay components, adjustments, ledger entries, payout batches, and disbursement attempts. The system must tolerate late corrections, duplicate delivery events, and retries from payout providers without paying a Dasher twice.
The important framing difference is that pay computation should be reproducible and auditable: given the same earning inputs and policy version, the result should be explainable. You would likely emphasize immutable earning events, batch finalization, and double-entry ledgers more than user-facing checkout latency. The same idempotency and reconciliation patterns still apply when calling external payout rails.
Common pitfalls
Pitfall: Treating payment as a single synchronous API call.
A tempting answer is “the API calls Stripe, stores success or failure, and returns.” That misses the hard part: provider timeouts, duplicate requests, delayed webhooks, and partial failures. A better answer models payment attempts, persists intermediate states, and reconciles asynchronously.
Pitfall: Saying “use
Kafka” without explaining correctness.
Queues do not solve duplicate execution by themselves. Messages can be delivered more than once, consumers can crash mid-processing, and ordering is not guaranteed globally. The stronger answer is “use at-least-once delivery with idempotent consumers, unique constraints, state-machine guards, and an outbox.”
Pitfall: Ignoring auditability and reversals.
For money systems, updating a balance column directly is usually not enough. Interviewers expect you to preserve history, support refunds or adjustments, and explain how finance or support can answer “what happened to this dollar?” Immutable ledger entries plus correction transactions land much better.
Connections
Interviewers may pivot from this topic into distributed transactions, event-driven architecture, rate limiting, database isolation levels, or observability for critical workflows. They may also ask you to compare Postgres transactions, Kafka-backed event streams, and scheduled batch jobs for different parts of the same payment lifecycle.
Further reading
-
Stripe API Idempotent Requests — Practical reference for idempotency-key behavior, replayed responses, and conflict handling.
-
Designing Data-Intensive Applications — Strong background on reliability, transactions, logs, streams, and distributed-system failure modes.
-
Martin Kleppmann, “Transactions: Myths, Surprises and Opportunities” — Useful context on transaction semantics and why distributed correctness is subtle.
Featured in interview prep guides
Practice questions
- Handle payment-service outagesDoorDash · Software Engineer · Onsite · easy
- Design an API for pay computation with retriesDoorDash · Software Engineer · Take-home Project · medium
- Design a donations service with 3-day rolling totalsDoorDash · Software Engineer · Onsite · medium
- Design an async donation payment platformDoorDash · Software Engineer · Onsite · hard
- Design a resilient dasher payment systemDoorDash · Software Engineer · Technical Screen · hard
- Design donation database and failure handlingDoorDash · Software Engineer · Onsite · hard
- Design a 3-day donation platformDoorDash · Software Engineer · Onsite · hard
- Design payment and delivery services for dasher payoutsDoorDash · Software Engineer · Technical Screen · hard
- Design limited-time donation platformDoorDash · Software Engineer · Technical Screen · hard
- Handle a payment-service incident with resource spikesDoorDash · Software Engineer · Technical Screen · hard
Related concepts
- Payment Processing And Ledger SystemsSystem Design
- Payment Systems: Ledgers, Idempotency, and Reconciliation
- Wallets, Payments, And Refund LedgersSystem Design
- Delivery Driver Payment And Cost SystemsSystem Design
- Auctions, Ticketing, And Real-Time MessagingSystem Design
- Messaging, Event Pipelines, and Delivery SemanticsSystem Design