Design a resilient dasher payment system
Company: DoorDash
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
##### Question
Design an end-to-end payment system for DoorDash delivery drivers (Dashers) that computes payouts from order/delivery lifecycle events. The event stream contains lifecycle records of the form `{dasherId, orderId (a.k.a. deliveryId), timestamp, status}`, where `status ∈ {accepted, fulfilled, cancelled}` (some variants emit only `ACCEPT(orderId, dasherId, timestamp)` and `FULFILL(orderId, dasherId, timestamp)`). The Payment service consumes these events and computes earnings per dasher. Walk through and justify your design across the following:
1. **APIs and interfaces.** Define the contract between the event producer (Delivery) and the consumer (Payment): synchronous vs. asynchronous, request/response schemas, and the API Payment exposes to clients for querying a dasher's payout. What should a typical earnings response return (total-to-date, time-bounded totals, per-delivery line items) and why?
2. **Payout rules and calculation.** Specify what gets paid on `accepted` / `fulfilled`, how cancellations affect pay, and how Payment computes a dasher's earnings — totals over a time window and per-delivery breakdowns. Be explicit about which timestamp the payout is attributed to.
3. **Time handling.** Representation of `timestamp`, time zones, DST, ordering of events, and how to deal with late or out-of-order events.
4. **Data modeling and storage.** Tables/indexes (or in-memory structures) to support high-throughput ingestion, deduplication, idempotency keys, querying by `dasherId` and time range / pay period, and efficient recalculation.
5. **Idempotency and delivery semantics.** Deduplication, idempotency keys, and exactly-once vs. at-least-once ingestion. How do you avoid double-paying under at-least-once delivery?
6. **Bad / missing data — detection and remediation.** Assume happy-path inputs are valid, then proactively discuss how you detect and remediate data loss or corruption: a `FULFILL` arriving without a prior `ACCEPT`, an `ACCEPT` with no subsequent `FULFILL`, duplicate events, and out-of-order delivery.
7. **Batch vs. streaming, backfills, and reconciliation.** Streaming for near-real-time payouts vs. batch finalization at period close; how you backfill and reconcile.
8. **Reliability.** Error handling, retries, timeouts, circuit breakers, dead-letter queues — where you add try/catch, logging, and metrics.
9. **Consistency, monitoring, and trade-offs.** Consistency guarantees, correctness checks/invariants, monitoring and alerting, decoupling/versioning/contract testing, and the trade-offs you are making.
If time permits, sketch the key classes/methods for ingesting events and calculating payouts, and the API for querying a dasher's payout for a given **local** pay period (handling timezone and DST).
Quick Answer: A DoorDash software-engineer system-design interview question: design a resilient Dasher payment system that computes payouts from order lifecycle events (accept/fulfill/cancel). It spans event-driven ingestion, idempotency and exactly-once-vs-at-least-once semantics, data modeling for high write throughput and per-dasher pay-period queries, timezone/DST handling, detection and remediation of missing or out-of-order data, batch-vs-streaming reconciliation, reliability, and consistency trade-offs.