Design Large-Scale Inference Serving
Company: Waymo
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of large-scale ML inference systems, assessing competencies in capacity planning, latency and tail-latency engineering, memory and bandwidth estimation, hardware selection (CPUs/GPUs/specialized accelerators), batching and caching trade-offs, and reliability concerns such as out-of-memory prevention and recovery. It is commonly asked to test practical system-design skills for production deployment by requiring back-of-the-envelope QPS and resource estimates and reasoning about operational trade-offs; this belongs to the ML system design category and emphasizes practical application over purely conceptual theory.