Design an ML inference orchestration platform
Company: Palo Alto Networks
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Onsite
Your company exposes multiple ML models as independent services (e.g., classification, embeddings, re-ranking). Design a platform that orchestrates these models into end-to-end workflows for external users. Describe request routing/orchestration, data flow between models, schemas and validation, intermediate storage/caching, final results storage, model/version management, failure handling and retries, monitoring/traceability, scaling and deployment, and multi-tenant auth/rate limiting. Justify when to use RPC versus REST between services and the trade-offs in latency, schema evolution, observability, and backward compatibility.
Quick Answer: This question evaluates system-design and ML infrastructure competencies, focusing on orchestration of multi-model inference workflows, distributed data flow and storage, versioning and model management, scaling and deployment, failure handling, observability, and multi-tenant security and quota management.