This question evaluates a candidate's ability to design a multi-tenant, reliable, and scalable backend for asynchronous LLM prompt processing, covering API design, job orchestration, model routing, prompt versioning, idempotency, retries/DLQ, result storage, observability, and non-functional concerns like cost control, security, and SLOs.

Design a multi-tenant backend that processes large language model (LLM) prompts asynchronously. Clients submit prompts via an API and later poll for status/results or receive callbacks via webhooks. The system must support reliability, scale, and cost controls.
Describe the architecture, data flows, and key design choices. Provide concrete API designs and operational policies.
Login required