How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at DoorDash.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at DoorDash during technical interviews.

Deep-dive a project architecture | DoorDash Interview Question

Q: Deep-dive a project architecture

An open-ended DoorDash software-engineer system-design screen: pick a real project and deep-dive its architecture end to end. Candidates draw the diagram, state invariants, walk read/write paths, justify storage and trade-offs, capacity-plan, cover reliability/security/observability with SLOs, recount a real incident, and propose a quantified 10x-traffic plan. Includes a full worked example built around a real-time delivery dispatch and live order-tracking service.

Question

Pick one of your recent, non-trivial projects and conduct a deep, end-to-end technical review. This is an open-ended architecture deep-dive: there is no single reference design, so use a real system you built and own. The interviewer will probe each layer in turn, so be ready to whiteboard the system, quantify your claims, and defend your decisions under follow-up pressure.

Address all twelve parts below. Lead with requirements and the one or two invariants that make the problem hard before you draw boxes, drive the layers in order, and back every claim with concrete numbers and real examples from your project — avoid hand-waving.

Constraints & Assumptions

Use a real, non-trivial project you personally architected or owned a major part of — depth and defensibility matter more than the system's fame.
Assume the interviewer will go deep on 2–3 areas of their choosing and expect concrete figures (QPS, latency percentiles, message rates, storage), not adjectives.
You will whiteboard. Treat the diagram as a contract: every component, data store, and edge interface (REST / gRPC / streaming / WebSocket) should be labelled.
"It was fast / it scaled" does not count as an answer — pair each qualitative claim with a measured or estimated number and the assumption behind it.

Clarifying Questions to Ask

These scope the whole deep-dive before you start drawing; ask them up front, then choose the project that best lets you answer them concretely:

Depth vs. breadth — do you want me to drive all twelve layers at a steady depth, or go shallow on most and let you pick 2–3 areas to interrogate deeply?
Which project — should I pick the system I owned end-to-end, or the one closest to your domain (e.g. high-throughput real-time, transactional, or batch/analytics)?
Level of the audience — are we whiteboarding for correctness and trade-offs, or do you also want production numbers (real measured p95/p99, actual incident postmortems)?
Scope of "I" — for a system built by a team, do you want only the parts I personally designed, or the full system with my ownership boundary called out?

Part 1 — Architecture diagram

Draw the end-to-end architecture: components, data stores, and the interfaces (REST / gRPC / streaming / WebSocket) between them.

What This Part Should Cover Premium

Part 2 — Data flow and invariants

Explain the major data flows and state the key correctness invariants the system must preserve (e.g. "exactly one active assignment per order").

What This Part Should Cover Premium

Part 3 — Read and write paths

Walk through the concrete read path and write path step by step, including caching, idempotency, and the latency budget for each.

What This Part Should Cover Premium

Part 4 — Data and storage

Describe the schema design and justify your storage choices (relational vs. NoSQL vs. cache vs. time-series / cold store), including partition / shard keys and indexes.

What This Part Should Cover Premium

Part 5 — Design decisions and trade-offs

Justify your major technology choices and call out the trade-offs you accepted (consistency vs. availability, push vs. poll, write-through vs. write-behind, etc.).

What This Part Should Cover Premium

Part 6 — Scalability, bottlenecks, and capacity planning

Give back-of-the-envelope numbers (QPS, message rates, storage), identify the bottlenecks, and explain how you'd capacity-plan for them.

What This Part Should Cover Premium

Part 7 — Consistency and reliability

Explain your consistency model, delivery guarantees, and reliability strategies (retries, idempotency, sagas, circuit breakers, DR / RPO / RTO).

What This Part Should Cover Premium

Part 8 — Security and access controls

Cover authN / authZ, service-to-service security, encryption in transit / at rest, PII handling, and compliance.

What This Part Should Cover Premium

Part 9 — Observability and SLAs/SLOs

Describe your logs, metrics, and traces, and define concrete SLIs / SLOs and the alerts / runbooks that back them.

What This Part Should Cover Premium

Part 10 — A significant incident or trade-off you handled

Describe a real production incident or hard trade-off decision, the root cause, and how you mitigated it.

What This Part Should Cover Premium

Part 11 — Two concrete improvements for a 10× traffic increase

Propose two specific changes that would let the system absorb 10× traffic, and quantify why they work.

What This Part Should Cover Premium

Part 12 — What you would redesign today and why

With hindsight, what would you change about the original design? Justify why the change is worth it.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Be ready for the interviewer to push past your main answer with probes like:

"Walk me through exactly what happens to an in-flight request when your hottest datastore shard fails — what does the client see, and how does the system recover?"
"Your invariant holds in the happy path. Show me the precise interleaving where two writers could violate it, and where your design stops it."
"At 100× rather than 10×, which of your two improvements stops working first, and what breaks next?"
"If you had to drop one of consistency, availability, or latency under a regional outage, which goes and why?"

Question

Constraints & Assumptions

Use a real, non-trivial project you personally architected or owned a major part of — depth and defensibility matter more than the system's fame.
Assume the interviewer will go deep on 2–3 areas of their choosing and expect concrete figures (QPS, latency percentiles, message rates, storage), not adjectives.
You will whiteboard. Treat the diagram as a contract: every component, data store, and edge interface (REST / gRPC / streaming / WebSocket) should be labelled.
"It was fast / it scaled" does not count as an answer — pair each qualitative claim with a measured or estimated number and the assumption behind it.

Clarifying Questions to Ask

These scope the whole deep-dive before you start drawing; ask them up front, then choose the project that best lets you answer them concretely:

Depth vs. breadth — do you want me to drive all twelve layers at a steady depth, or go shallow on most and let you pick 2–3 areas to interrogate deeply?
Which project — should I pick the system I owned end-to-end, or the one closest to your domain (e.g. high-throughput real-time, transactional, or batch/analytics)?
Level of the audience — are we whiteboarding for correctness and trade-offs, or do you also want production numbers (real measured p95/p99, actual incident postmortems)?
Scope of "I" — for a system built by a team, do you want only the parts I personally designed, or the full system with my ownership boundary called out?

Part 1 — Architecture diagram

Draw the end-to-end architecture: components, data stores, and the interfaces (REST / gRPC / streaming / WebSocket) between them.

What This Part Should Cover Premium

Part 2 — Data flow and invariants

Explain the major data flows and state the key correctness invariants the system must preserve (e.g. "exactly one active assignment per order").

What This Part Should Cover Premium

Part 3 — Read and write paths

Walk through the concrete read path and write path step by step, including caching, idempotency, and the latency budget for each.

What This Part Should Cover Premium

Part 4 — Data and storage

Describe the schema design and justify your storage choices (relational vs. NoSQL vs. cache vs. time-series / cold store), including partition / shard keys and indexes.

What This Part Should Cover Premium

Part 5 — Design decisions and trade-offs

Justify your major technology choices and call out the trade-offs you accepted (consistency vs. availability, push vs. poll, write-through vs. write-behind, etc.).

What This Part Should Cover Premium

Part 6 — Scalability, bottlenecks, and capacity planning

Give back-of-the-envelope numbers (QPS, message rates, storage), identify the bottlenecks, and explain how you'd capacity-plan for them.

What This Part Should Cover Premium

Part 7 — Consistency and reliability

Explain your consistency model, delivery guarantees, and reliability strategies (retries, idempotency, sagas, circuit breakers, DR / RPO / RTO).

What This Part Should Cover Premium

Part 8 — Security and access controls

Cover authN / authZ, service-to-service security, encryption in transit / at rest, PII handling, and compliance.

What This Part Should Cover Premium

Part 9 — Observability and SLAs/SLOs

Describe your logs, metrics, and traces, and define concrete SLIs / SLOs and the alerts / runbooks that back them.

What This Part Should Cover Premium

Part 10 — A significant incident or trade-off you handled

Describe a real production incident or hard trade-off decision, the root cause, and how you mitigated it.

What This Part Should Cover Premium

Part 11 — Two concrete improvements for a 10× traffic increase

Propose two specific changes that would let the system absorb 10× traffic, and quantify why they work.

What This Part Should Cover Premium

Part 12 — What you would redesign today and why

With hindsight, what would you change about the original design? Justify why the change is worth it.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Be ready for the interviewer to push past your main answer with probes like:

"Walk me through exactly what happens to an in-flight request when your hottest datastore shard fails — what does the client see, and how does the system recover?"
"Your invariant holds in the happy path. Show me the precise interleaving where two writers could violate it, and where your design stops it."
"At 100× rather than 10×, which of your two improvements stops working first, and what breaks next?"
"If you had to drop one of consistency, availability, or latency under a regional outage, which goes and why?"

Deep-dive a project architecture

Quick Overview