PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/Lambda

Design a cloud AI inference platform

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in ML system design and operational engineering, covering model packaging and versioning, hardware selection (CPU vs GPU), autoscaling, request routing, multi-tenant isolation, SLO-driven observability, cost controls, and failure handling within a cloud inference platform.

  • hard
  • Lambda
  • ML System Design
  • Software Engineer

Design a cloud AI inference platform

Company: Lambda

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: HR Screen

Design a cloud-based AI inference platform for real-time and batch workloads. Specify model packaging and versioning, hardware selection (CPU vs. GPU), autoscaling, request routing, and multi-tenant isolation. Discuss latency/throughput targets, cost controls (right-sizing, spot instances), observability (tracing, metrics), safe rollout strategies (canary, shadow), and failure handling. Describe how you would integrate with Kubernetes and a model registry, and compare your approach to similar industry offerings.

Quick Answer: This question evaluates competency in ML system design and operational engineering, covering model packaging and versioning, hardware selection (CPU vs GPU), autoscaling, request routing, multi-tenant isolation, SLO-driven observability, cost controls, and failure handling within a cloud inference platform.

Lambda logo
Lambda
Sep 6, 2025, 12:00 AM
Software Engineer
HR Screen
ML System Design
7
0

System Design: Cloud AI Inference Platform for Real-Time and Batch

Context

Design a multi-tenant cloud platform that serves machine learning models for both real-time (online) and batch (offline) workloads. The platform should support multiple model frameworks and versions, meet latency and throughput targets, and provide strong isolation, observability, and cost control.

Requirements

  1. Model Packaging and Versioning
    • How will models (artifacts + code + environment) be packaged, versioned, and promoted across environments?
  2. Hardware Selection
    • When to use CPU vs GPU for different models/workloads? Consider quantization/compilation and GPU sharing.
  3. Autoscaling Strategy
    • Horizontal/vertical autoscaling for online and batch; scale-to-zero for idle models.
  4. Request Routing
    • Routing by model/version/tenant; consistent hashing; priority and rate limiting; streaming for tokens.
  5. Multi-Tenant Isolation
    • Compute/storage/network isolation; quotas; fairness; GPU partitioning.
  6. SLO Targets
    • Propose latency and throughput targets for representative model types (e.g., small classifiers, embeddings, LLM text generation) and batch SLAs.
  7. Cost Controls
    • Right-sizing; spot/preemptible instances; model optimization (quantization, distillation); bin packing; budgets/chargeback.
  8. Observability
    • Tracing, metrics, logs; SLI/SLOs; GPU metrics; model quality/drift.
  9. Safe Rollouts
    • Canary and shadow deployments; rollback criteria; guardrails.
  10. Failure Handling
    • Retries/backoff; circuit breaking; regional failover; GPU OOM; degraded modes.
  11. Kubernetes Integration
    • Ingress, scheduling (GPUs/MIG), operators/CRDs, HPA/KEDA, service mesh, persistent caches.
  12. Model Registry Integration
    • Registry choices, signatures/schemas, lineage; CI/CD from registry to serving.
  13. Industry Comparison
    • Briefly compare to managed offerings and open-source stacks.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Lambda•More Software Engineer•Lambda Software Engineer•Lambda ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.