Design a cloud AI inference platform
Company: Lambda
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: HR Screen
Design a cloud-based AI inference platform for real-time and batch workloads. Specify model packaging and versioning, hardware selection (CPU vs. GPU), autoscaling, request routing, and multi-tenant isolation. Discuss latency/throughput targets, cost controls (right-sizing, spot instances), observability (tracing, metrics), safe rollout strategies (canary, shadow), and failure handling. Describe how you would integrate with Kubernetes and a model registry, and compare your approach to similar industry offerings.
Quick Answer: This question evaluates competency in ML system design and operational engineering, covering model packaging and versioning, hardware selection (CPU vs GPU), autoscaling, request routing, multi-tenant isolation, SLO-driven observability, cost controls, and failure handling within a cloud inference platform.