Design Real-Time Driver Pay Aggregation
Company: DoorDash
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
Design a production system that ingests a high-volume stream of delivery-order events and continuously computes each driver's (Dasher's) **daily pay**.
The core pay logic is simple in isolation: given a driver's sequence of order events for a day, sum the pay over the time they were actively working. The twist is a **peak-period multiplier** — during designated peak windows, the pay rate is multiplied (e.g., $\times 2$), so any work interval that straddles a peak boundary must be **split at the boundary** and each sub-interval paid at its correct rate.
The interview starts from this computation and deliberately expands into a full streaming/system-design discussion: how do you run this correctly and reliably at DoorDash scale, with messy input, huge volume, and real production failure modes?
### Constraints & Assumptions
- **Input:** JSON order events published by upstream services. Each event carries identifiers and a timestamp; the exact schema is open and should be agreed with the interviewer.
- **Delivery semantics:** at-least-once. Events may arrive **duplicated**, **out of order**, or **late**; some may be **malformed**.
- **Scale:** a full day's raw events for all drivers may **not fit in memory** on a single machine.
- **Availability:** the system must keep producing correct daily totals across **consumer crashes and service outages**.
- **Peak windows:** a set of `[start, end)` time ranges per day with an associated multiplier; intervals are half-open so boundaries don't double-count.
- **Correctness:** pay numbers feed real payouts — auditability and reproducibility matter more than shaving the last millisecond of latency.
### Part 1 — Core pay computation with peak-hour splitting
Given one driver's ordered work intervals for a day and a set of peak windows, compute the driver's total pay. A single interval may span zero, one, or several peak boundaries; each resulting sub-interval is paid at base rate or the peak-multiplied rate.
```hint Where to start
Reduce both the work intervals and the peak windows to points on a single timeline. Think about a **sweep** over sorted boundary points, or **interval intersection** between each work interval and each peak window.
```
```hint Splitting cleanly
Literally slicing each work interval at every peak boundary it crosses gets fiddly fast — multiple crossings, and double-counting at half-open boundaries. Before you write that, ask what the pay for one interval actually *depends on*. Is there a quantity you can measure per interval that captures the whole peak effect, so the boundary bookkeeping drops out?
```
```hint Edge cases
Zero-length intervals, intervals fully inside or fully outside a peak window, back-to-back peak windows, and an interval that exactly touches a boundary. Decide the half-open convention up front and apply it everywhere.
```
### Part 2 — End-to-end streaming architecture at scale
Design the production system that runs Part 1 continuously over a high-volume event stream. Cover the publisher/consumer flow, how events become work intervals, where state and results live, and the read path for "what is driver X's pay so far today?"
```hint Pipeline shape
Separate **lifecycle reconstruction** (turning raw order events into completed, paid intervals) from **aggregation** (summing intervals into a per-driver, per-day total). The two stages often want different partition keys.
```
```hint Partitioning
Partition by `order_id` to assemble an order's lifecycle on one consumer; partition by `driver_id` to keep a driver's running daily total local and lock-free. Name the trade-off rather than picking blindly.
```
### Part 3 — Correctness under messy input
Handle at-least-once delivery (duplicates), out-of-order and late events, and malformed events — without corrupting daily totals.
```hint Idempotency
Make every aggregate update **idempotent** using a stable key (an `event_id` or a deterministic composite). Dedup *before* the number moves, not after.
```
```hint Late data
Event-time processing with **watermarks** and a bounded grace period lets you admit late events correctly. Decide what happens to events that arrive after the window finalizes — silent drop, or an explicit **adjustment/correction** record.
```
### Part 4 — Scale, failure recovery, and upstream safety
Address the parts that break under real production load:
- **Large data:** a day's raw events exceed memory. How do you keep streaming state bounded, and how do you recompute history (backfill/reconciliation) when data is too large to sort in memory?
- **Failure recovery:** a consumer or the service crashes mid-day. How do you resume without losing or double-counting pay?
- **Upstream APIs:** if processing must call an upstream service in production, how do you protect both systems?
```hint Bounded state vs. replay
Keep only the **active keyed state** in memory; archive raw events to durable object storage so any day can be **replayed**. For oversized recomputation, reach for **external sort** or a distributed sort/batch job keyed by `(driver, day)` and event time.
```
```hint Don't double-count on recovery
The relationship between **offset commits** and **durable state** is what makes replay safe. Combine checkpointing with idempotent writes so re-reading the log after a crash converges to the same totals.
```
```hint Protecting upstream calls
The standard production toolkit: **retries with backoff + jitter**, **circuit breakers**, **rate limiting**, **bounded queues / backpressure**. Prefer cached reference data or async enrichment over a synchronous upstream call per event.
```
### Clarifying Questions to Ask
- What exactly is in an order event (fields, event types, schema versioning), and what defines a "work interval" — order accept→deliver, or active dash time?
- How is pay calculated at base rate (per-minute, per-order, per-mile, or a blend), and is the peak multiplier applied to the whole pay or only the time-based component?
- How are peak windows defined and distributed — static config, or a separate event stream — and can they change retroactively for a day already in progress?
- What freshness does the read path need (sub-second "pay so far today" vs. an end-of-day finalized number), and who consumes it (driver app, payouts batch, support)?
- What is the daily event volume and per-driver order rate, and what is the late-event distribution (how late can events realistically arrive)?
- When a finalized daily total later proves wrong, is in-place correction acceptable, or must we emit immutable adjustments for audit?
### What a Strong Answer Covers
- A correct, well-reasoned Part 1 algorithm with an explicit half-open-boundary convention and the obvious edge cases handled.
- A clear two-stage pipeline (lifecycle reconstruction → aggregation) with a **justified** partition-key choice per stage, not a hand-wave.
- Idempotency tied to a concrete key, plus a deliberate event-time + watermark + grace-period story for late/out-of-order data.
- A dead-letter / quarantine path for malformed events and invalid state transitions, kept off the hot aggregation path.
- A bounded-memory streaming-state design plus a separate batch/replay path for backfill and **reconciliation** (streaming output vs. recomputed truth).
- Crash recovery that is explicit about offset-commit vs. durable-state ordering, so replay neither loses nor double-counts pay.
- Upstream-protection patterns (retry/backoff, circuit breaker, rate limit, backpressure) with a reason to avoid synchronous per-event calls.
- Observability: the specific metrics that would catch a silent payout bug (consumer lag, dup rate, invalid-event rate, late-event volume, reconciliation deltas).
- Honest trade-offs: streaming freshness vs. batch reproducibility, immutability/replay vs. in-place mutation.
### Follow-up Questions
- Peak windows for today are corrected at 5pm (a window's multiplier changes). How does the already-accumulated pay for affected drivers get fixed, and what does the user-visible number do in the meantime?
- A bug shipped this morning over-counted peak pay for some drivers. Walk through detecting it, recomputing the correct totals from the durable log, and reconciling without paying anyone twice.
- The driver app needs "pay so far today" with sub-second latency for millions of drivers. How does the serving/read path differ from the write path, and how do you keep it cheap?
- A single hot driver (or hot order key) creates a partition skew that lags one consumer badly. How do you detect and mitigate it without breaking per-key ordering guarantees?
Quick Answer: This question evaluates competence in designing scalable, fault-tolerant streaming systems and implementing correct time-windowed aggregation logic, testing skills in distributed systems, stream processing, and data correctness; it belongs to the system design domain and streaming/real-time data engineering and spans both conceptual architectural thinking and practical application-level concerns. It is commonly asked to probe reasoning about correctness under at-least-once delivery and messy inputs (out-of-order, duplicate, late, or malformed events), operational resilience, and trade-offs between throughput, latency, and auditability in production-scale pipelines.