Design a batched inference API
Company: Anthropic
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Onsite
Quick Answer: This question evaluates competency in designing scalable, low-latency ML inference systems with dynamic batching, covering system architecture, request batching and scheduling, model routing/versioning, and operational concerns such as autoscaling, reliability, timeouts, and observability.