Design a batch inference API
Company: Anthropic
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Onsite
Design an inference service API where clients POST a job and later poll for results. Requirements: accept single or batch inputs; return a job ID on submission; provide status endpoints (queued, running, succeeded, failed); no streaming required. Specify request/response schemas, idempotency keys, timeout and retry behavior, and rate limits. Describe the job queue, workers, and storage of intermediate and final results; how you would scale workers, batch efficiently, and utilize accelerators; and how you would implement observability, error handling, and partial failures within a batch.
Quick Answer: This question evaluates a candidate's ability to design asynchronous batch inference APIs and systems, including API schema and job lifecycle design, idempotency semantics, queueing and worker scaling, batching and accelerator utilization, rate limiting, observability, and error handling.