Design an AI gateway that sits between internal services and multiple LLM providers such as OpenAI and Azure OpenAI.
The gateway should expose a unified interface to internal applications and handle provider routing. Two key requirements were emphasized:
-
Decide when to fall back from one provider to another.
-
Produce logging and analytics that business stakeholders can use.
Discuss:
-
the high-level architecture and major components
-
how requests are routed to a primary provider and when fallback is triggered
-
what online metrics should drive fallback decisions, such as latency, error rate, rate limits, quality or schema-validation failures, and cost
-
how to support retries, circuit breakers, quotas, and provider health checks
-
what should be logged for each request and what aggregated metrics should be exposed to the business team
-
how to protect sensitive prompt and response data while still giving useful observability
-
scalability, reliability, and trade-offs