Design a prompt processing backend

Q: Design a prompt processing backend

This is a ML System Design interview question from Anthropic for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: Background Processing Backend for LLM Prompts

Context

Design a multi-tenant backend that processes large language model (LLM) prompts asynchronously. Clients submit prompts via an API and later poll for status/results or receive callbacks via webhooks. The system must support reliability, scale, and cost controls.

Requirements

APIs
- Submit prompts (with idempotency keys), poll job status, fetch results, register webhooks/callbacks.
Job orchestration
- Queueing, prioritization (e.g., realtime vs bulk), worker pools, retries, dead-letter queues (DLQ).
Model routing
- Route requests to appropriate model/provider based on policy (latency/cost/quality/capacity).
Prompt versioning
- Manage template versions and the exact prompt/model context used for reproducibility.
Idempotency
- Ensure duplicate submissions do not create duplicate work/charges.
Retries and DLQ
- Automatic retry with backoff; poison message handling.
Result storage
- Store inputs/outputs/metadata, enable polling and callback delivery; set retention policies.
Observability
- Metrics, logs, traces; per-tenant dashboards, alerting, audits.
Non-functionals
- Scaling and capacity planning, cost control, rate limiting, PII/security, and SLAs/SLOs.
Follow-up

Support streaming partial outputs and cancellation of in-flight jobs.

Describe the architecture, data flows, and key design choices. Provide concrete API designs and operational policies.

Design a prompt processing backend

System Design: Background Processing Backend for LLM Prompts

Context

Requirements

Solution (Locked)

Comments (0)