PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/DoorDash

Design a service aggregator with robust error handling

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of distributed system design and reliability patterns—covering concurrency, timeboxing and global deadlines, retries and circuit breakers, resource isolation, payload merging, and observability—required for building an HTTP aggregator that handles partial failures and returns well-formed degraded responses.

  • hard
  • DoorDash
  • System Design
  • Software Engineer

Design a service aggregator with robust error handling

Company: DoorDash

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design and implement an HTTP aggregator that calls three independent downstream services (A, B, C) in parallel and returns a single consolidated JSON response. Specify the aggregator's request/response schema and status codes. Define per-call timeouts and a global deadline (e.g., 300 ms) so one slow service does not block the whole request. Handle errors and partial failures with retries (with backoff/jitter), circuit breaking, and sensible fallbacks/defaults; ensure idempotency and do not duplicate side effects on retries. If the global deadline is exceeded, cancel in-flight work and return a degraded but well-formed response. Describe how you merge the three payloads (e.g., A=user profile, B=recent orders, C=recommendations) and how you represent missing/erroneous sub-responses in the final JSON. Discuss concurrency model, thread safety, resource limits (connection pools), rate limiting, and bulkheading. Outline logging, metrics, and distributed tracing for observability (including correlation IDs) and what you would test (unit/integration, timeouts, partial failures). Provide production-grade naming and code structure (modules/classes) and include pseudocode or code in a language of your choice.

Quick Answer: This question evaluates understanding of distributed system design and reliability patterns—covering concurrency, timeboxing and global deadlines, retries and circuit breakers, resource isolation, payload merging, and observability—required for building an HTTP aggregator that handles partial failures and returns well-formed degraded responses.

Related Interview Questions

  • Design a Food Rating System - DoorDash (medium)
  • Design a resilient bootstrap API - DoorDash (medium)
  • Design Real-Time Driver Pay Aggregation - DoorDash (hard)
  • Design Food Ratings and Driver Payouts - DoorDash (medium)
  • Design personalized restaurant search and recommendations - DoorDash (medium)
DoorDash logo
DoorDash
Jul 26, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
3
0

System Design: HTTP Aggregator With Deadlines, Resilience, and Observability

Context

Build an HTTP aggregator that fans out to three independent downstream services in parallel and returns a single consolidated JSON response. Assume the downstreams are:

  • Service A: User Profile
  • Service B: Recent Orders
  • Service C: Recommendations

The aggregator must be production-grade with strong reliability, performance, and observability guarantees.

Requirements

  1. API design
    • Define the aggregator's request schema, response schema, and HTTP status codes.
    • Show how missing/erroneous sub-responses are represented in the final JSON.
  2. Concurrency and timeboxing
    • Call A, B, C in parallel.
    • Define per-call timeouts and a global deadline (e.g., 300 ms) so a slow service does not block the whole request.
    • If the global deadline is exceeded, cancel in-flight work and return a degraded but well-formed response.
  3. Resilience
    • Handle errors and partial failures with retries (exponential backoff + jitter), circuit breaking, and sensible fallbacks/defaults.
    • Ensure idempotency and avoid duplicating side effects on retries.
  4. Merging logic
    • Describe how to merge the payloads (A=user profile, B=recent orders, C=recommendations) into one response.
  5. Resource management and isolation
    • Discuss concurrency model, thread safety, resource limits (connection pools), rate limiting, and bulkheading.
  6. Observability
    • Outline logging, metrics, and distributed tracing (including correlation IDs) for end-to-end visibility.
  7. Testing
    • Describe the testing plan (unit, integration), including timeouts, cancellations, retries, partial failures, and circuit breaking.
  8. Code and structure
    • Provide production-grade naming and code structure (modules/classes).
    • Include pseudocode or code in a language of your choice implementing the handler and fan-out/fan-in with cancellation and retries.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More DoorDash•More Software Engineer•DoorDash Software Engineer•DoorDash System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.