Build an API aggregator with concurrency and retries

Q: Build an API aggregator with concurrency and retries

This question evaluates skills in building resilient API aggregation services, including concurrent parallel calls with futures/promises, per-call and overall timeouts, configurable failure policies (WAIT_ALL vs FAIL_FAST), retry mechanics with capped exponential backoff and jitter, partial-failure handling, and observability via structured logs and metrics. It is commonly asked in system design interviews to probe practical implementation of concurrency and resilience patterns, trade-offs in failure policies and timeouts, and the ability to define clear interfaces and monitoring; category: System Design; level: practical application with architectural and conceptual considerations.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Build an Aggregation Service with Parallel Calls, Timeouts, Retries, and Observability

Context

You are designing a backend service that exposes a single HTTP endpoint. When called, the endpoint must call three external HTTP APIs in parallel, aggregate their responses, and return a combined JSON result. The service must be robust to timeouts, failures, and include proper retries, observability, and clear code organization.

Assume a typed language with futures/promises support (e.g., Java with CompletableFuture). You may choose reasonable defaults and make minimal assumptions if needed.

Requirements

Endpoint
- Expose one endpoint (e.g., GET /aggregate) that returns a combined JSON response from three upstream services: A, B, and C.
Concurrency
- Call the three upstream HTTP APIs in parallel using futures/promises.
Timeouts
- Per-call timeout for each upstream request.
- Overall request timeout (deadline) for the whole aggregation request.
Policy
- Configurable policy to determine behavior:
  - WAIT_ALL: wait for all upstreams, return partial data with defaults if some fail.
  - FAIL_FAST: fail the overall request as soon as any upstream fails or times out.
Retries
- Implement retries with capped exponential backoff and jitter via a reusable RetryTemplate that accepts a Callable.
Partial Failure Handling
- When some upstreams fail, return partial data along with default values and error details.
Observability
- Structured logging with correlation IDs.
- Metrics (latency, success/fail counts, timeouts, retries).
Deliverables
- Interface definitions for clients, retry template, and service layer.
- Concurrency flow description.
- Sample error-handling logic and example responses.

Build an API aggregator with concurrency and retries

Build an Aggregation Service with Parallel Calls, Timeouts, Retries, and Observability

Context

Requirements

Solution

Comments (0)

Build an API aggregator with concurrency and retries

Overview

Build an Aggregation Service with Parallel Calls, Timeouts, Retries, and Observability

Context

Requirements

Solution

Comments (0)