PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Anthropic

Design a batch inference API

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design asynchronous batch inference APIs and systems, including API schema and job lifecycle design, idempotency semantics, queueing and worker scaling, batching and accelerator utilization, rate limiting, observability, and error handling.

  • hard
  • Anthropic
  • ML System Design
  • Software Engineer

Design a batch inference API

Company: Anthropic

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

Design an inference service API where clients POST a job and later poll for results. Requirements: accept single or batch inputs; return a job ID on submission; provide status endpoints (queued, running, succeeded, failed); no streaming required. Specify request/response schemas, idempotency keys, timeout and retry behavior, and rate limits. Describe the job queue, workers, and storage of intermediate and final results; how you would scale workers, batch efficiently, and utilize accelerators; and how you would implement observability, error handling, and partial failures within a batch.

Quick Answer: This question evaluates a candidate's ability to design asynchronous batch inference APIs and systems, including API schema and job lifecycle design, idempotency semantics, queueing and worker scaling, batching and accelerator utilization, rate limiting, observability, and error handling.

Related Interview Questions

  • Design GPU inference request batching - Anthropic
  • How do you handle an LLM agents interview? - Anthropic (hard)
  • Design a prompt playground - Anthropic (medium)
  • Design a model downloader - Anthropic (medium)
  • Design a high-concurrency LLM inference service - Anthropic (hard)
Anthropic logo
Anthropic
Sep 6, 2025, 12:00 AM
Software Engineer
Onsite
ML System Design
41
0

System Design: Async Inference Service API (POST Job, Poll for Results)

Context

You are designing an asynchronous inference service where clients submit a single item or a batch of items for model inference. The service should immediately acknowledge submission with a job ID and allow clients to poll for status and results later. No streaming of results is required.

Requirements

  1. API behavior
    • Accept single or batch inputs.
    • On submission, return a job ID immediately.
    • Provide status endpoints with states: queued, running, succeeded, failed.
    • No streaming response is required (polling only).
  2. API design
    • Specify request/response schemas for submission, status, and results.
    • Include idempotency keys and semantics.
    • Define timeout and retry behavior (client and server side).
    • Define rate limits and backpressure behavior.
  3. Architecture
    • Describe the job queue, workers, and storage of inputs, intermediate, and final results.
    • Explain how to scale workers, batch efficiently, and utilize accelerators (e.g., GPUs).
  4. Operability
    • Implement observability (metrics, logs, tracing).
    • Error handling and standardized error schema.
    • Handling of partial failures within a batch.

Assume a typical cloud environment and standard components are available (HTTP gateway, object storage, message queues, autoscaling, etc.).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.