Design a prompt processing backend

Q: Design a prompt processing backend

This question evaluates a candidate's ability to design a multi-tenant, reliable, and scalable backend for asynchronous LLM prompt processing, covering API design, job orchestration, model routing, prompt versioning, idempotency, retries/DLQ, result storage, observability, and non-functional concerns like cost control, security, and SLOs.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

System Design: Background Processing Backend for LLM Prompts

Context

Design a multi-tenant backend that processes large language model (LLM) prompts asynchronously. Clients submit prompts via an API and later poll for status/results or receive callbacks via webhooks. The system must support reliability, scale, and cost controls.

Requirements

APIs
- Submit prompts (with idempotency keys), poll job status, fetch results, register webhooks/callbacks.
Job orchestration
- Queueing, prioritization (e.g., realtime vs bulk), worker pools, retries, dead-letter queues (DLQ).
Model routing
- Route requests to appropriate model/provider based on policy (latency/cost/quality/capacity).
Prompt versioning
- Manage template versions and the exact prompt/model context used for reproducibility.
Idempotency
- Ensure duplicate submissions do not create duplicate work/charges.
Retries and DLQ
- Automatic retry with backoff; poison message handling.
Result storage
- Store inputs/outputs/metadata, enable polling and callback delivery; set retention policies.
Observability
- Metrics, logs, traces; per-tenant dashboards, alerting, audits.
Non-functionals
- Scaling and capacity planning, cost control, rate limiting, PII/security, and SLAs/SLOs.
Follow-up

Support streaming partial outputs and cancellation of in-flight jobs.

Describe the architecture, data flows, and key design choices. Provide concrete API designs and operational policies.

Design a prompt processing backend

System Design: Background Processing Backend for LLM Prompts

Context

Requirements

Solution

Comments (0)

Design a prompt processing backend

Overview

System Design: Background Processing Backend for LLM Prompts

Context

Requirements

Solution

Comments (0)