PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/Anthropic

Design a prompt processing backend

Last updated: Mar 29, 2026

Quick Overview

Design a prompt processing backend evaluates ML product requirements, data/labeling, modeling, serving architecture, evaluation, monitoring, and trade-offs in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

  • hard
  • Anthropic
  • ML System Design
  • Software Engineer

Design a prompt processing backend

Company: Anthropic

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

Design a background processing backend for large-language-model prompts. Clients submit prompts via an API and later poll or receive callbacks with results. Specify APIs, job queueing and prioritization, worker pools, model routing, prompt versioning, idempotency keys, retries and dead-letter queues, result storage, and observability. Address scaling, cost control, rate limiting, PII/security, and SLAs. Follow-up: support streaming partial outputs and cancellation.

Quick Answer: Design a prompt processing backend evaluates ML product requirements, data/labeling, modeling, serving architecture, evaluation, monitoring, and trade-offs in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

Related Interview Questions

  • Design Model Weight Distribution - Anthropic (medium)
  • Design GPU inference request batching - Anthropic
  • How do you handle an LLM agents interview? - Anthropic (hard)
  • Design a prompt playground - Anthropic (medium)
  • Design a model downloader - Anthropic (medium)
|Home/ML System Design/Anthropic

Design a prompt processing backend

Anthropic logo
Anthropic
Jul 26, 2025, 12:00 AM
hardSoftware EngineerOnsiteML System Design
12
0

Design a prompt processing backend

System Design: Background Processing Backend for LLM Prompts

Context

Design a multi-tenant backend that processes large language model (LLM) prompts asynchronously. Clients submit prompts via an API and later poll for status/results or receive callbacks via webhooks. The system must support reliability, scale, and cost controls.

Requirements

  1. APIs
    • Submit prompts (with idempotency keys), poll job status, fetch results, register webhooks/callbacks.
  2. Job orchestration
    • Queueing, prioritization (e.g., realtime vs bulk), worker pools, retries, dead-letter queues (DLQ).
  3. Model routing
    • Route requests to appropriate model/provider based on policy (latency/cost/quality/capacity).
  4. Prompt versioning
    • Manage template versions and the exact prompt/model context used for reproducibility.
  5. Idempotency
    • Ensure duplicate submissions do not create duplicate work/charges.
  6. Retries and DLQ
    • Automatic retry with backoff; poison message handling.
  7. Result storage
    • Store inputs/outputs/metadata, enable polling and callback delivery; set retention policies.
  8. Observability
    • Metrics, logs, traces; per-tenant dashboards, alerting, audits.
  9. Non-functionals
    • Scaling and capacity planning, cost control, rate limiting, PII/security, and SLAs/SLOs.
  10. Follow-up
  • Support streaming partial outputs and cancellation of in-flight jobs.

Describe the architecture, data flows, and key design choices. Provide concrete API designs and operational policies.

Constraints & Assumptions

  • Preserve the scope, facts, inputs, and requested outputs from the prompt above.
  • If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
  • Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

  • Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
  • State explicit assumptions before making sizing or architecture decisions.
  • Prioritize the functional path first, then address reliability, security, observability, and rollout.

What a Strong Answer Covers

  • A scoped requirements summary with concrete non-goals and success metrics.
  • ML-specific data, model, evaluation, serving, and monitoring choices.
  • Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
  • A validation, monitoring, migration, and launch plan appropriate for the risk level.

Follow-up Questions

  • What breaks first at 10x traffic or data volume?
  • How would you degrade gracefully during dependency failures?
  • What metrics and alerts would prove the design is healthy after launch?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic ML System Design•Software Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.