System Design Task: Resilient Multi‑Provider LLM Client Library
Context
You are designing a client library used by backend services to call external Large Language Model (LLM) providers (e.g., OpenAI, Anthropic, etc.). The library must route requests across multiple providers to maximize availability, control cost, and meet latency SLOs.
Requirements
-
Provider management
-
Provider registration/unregistration
-
Capability mapping: available models, max context length, supported features (streaming, function calling, JSON mode, etc.)
-
Governance
-
Per‑provider rate limiting (including per‑API key where applicable)
-
Quotas (per provider, per model, per tenant)
-
Resilience
-
Health checks (active and passive)
-
Timeouts (connect, request, total)
-
Circuit breakers (fail‑fast, half‑open probes)
-
Routing
-
Cost‑ and latency‑aware load balancing
-
Retry and fallback across providers when degraded or down
-
Observability
-
Metrics, structured logs, and distributed tracing
-
Security
-
Secure key management (storage, rotation, scoping, redaction)
-
Concurrency
-
Thread‑safe, high‑throughput, supports parallel requests and streaming
Deliverables
-
Describe interfaces and data structures
-
Explain request routing, failure handling, and concurrency strategy
-
Include assumptions where needed