Design a service aggregator with robust error handling

Q: Design a service aggregator with robust error handling

This question evaluates understanding of distributed system design and reliability patterns—covering concurrency, timeboxing and global deadlines, retries and circuit breakers, resource isolation, payload merging, and observability—required for building an HTTP aggregator that handles partial failures and returns well-formed degraded responses.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: HTTP Aggregator With Deadlines, Resilience, and Observability

Context

Build an HTTP aggregator that fans out to three independent downstream services in parallel and returns a single consolidated JSON response. Assume the downstreams are:

Service A: User Profile
Service B: Recent Orders
Service C: Recommendations

The aggregator must be production-grade with strong reliability, performance, and observability guarantees.

Requirements

API design
- Define the aggregator's request schema, response schema, and HTTP status codes.
- Show how missing/erroneous sub-responses are represented in the final JSON.
Concurrency and timeboxing
- Call A, B, C in parallel.
- Define per-call timeouts and a global deadline (e.g., 300 ms) so a slow service does not block the whole request.
- If the global deadline is exceeded, cancel in-flight work and return a degraded but well-formed response.
Resilience
- Handle errors and partial failures with retries (exponential backoff + jitter), circuit breaking, and sensible fallbacks/defaults.
- Ensure idempotency and avoid duplicating side effects on retries.
Merging logic
- Describe how to merge the payloads (A=user profile, B=recent orders, C=recommendations) into one response.
Resource management and isolation
- Discuss concurrency model, thread safety, resource limits (connection pools), rate limiting, and bulkheading.
Observability
- Outline logging, metrics, and distributed tracing (including correlation IDs) for end-to-end visibility.
Testing
- Describe the testing plan (unit, integration), including timeouts, cancellations, retries, partial failures, and circuit breaking.
Code and structure
- Provide production-grade naming and code structure (modules/classes).
- Include pseudocode or code in a language of your choice implementing the handler and fan-out/fan-in with cancellation and retries.

Design a service aggregator with robust error handling

System Design: HTTP Aggregator With Deadlines, Resilience, and Observability

Context

Requirements

Solution

Comments (0)

Design a service aggregator with robust error handling

Overview

System Design: HTTP Aggregator With Deadlines, Resilience, and Observability

Context

Requirements

Solution

Comments (0)