Design a service aggregator with robust error handling
Company: DoorDash
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
Design and implement an HTTP aggregator that calls three independent downstream services (A, B, C) in parallel and returns a single consolidated JSON response. Specify the aggregator's request/response schema and status codes. Define per-call timeouts and a global deadline (e.g., 300 ms) so one slow service does not block the whole request. Handle errors and partial failures with retries (with backoff/jitter), circuit breaking, and sensible fallbacks/defaults; ensure idempotency and do not duplicate side effects on retries. If the global deadline is exceeded, cancel in-flight work and return a degraded but well-formed response. Describe how you merge the three payloads (e.g., A=user profile, B=recent orders, C=recommendations) and how you represent missing/erroneous sub-responses in the final JSON. Discuss concurrency model, thread safety, resource limits (connection pools), rate limiting, and bulkheading. Outline logging, metrics, and distributed tracing for observability (including correlation IDs) and what you would test (unit/integration, timeouts, partial failures). Provide production-grade naming and code structure (modules/classes) and include pseudocode or code in a language of your choice.
Quick Answer: This question evaluates understanding of distributed system design and reliability patterns—covering concurrency, timeboxing and global deadlines, retries and circuit breakers, resource isolation, payload merging, and observability—required for building an HTTP aggregator that handles partial failures and returns well-formed degraded responses.