Software Engineer ML System Design Interview Questions
Practice the exact questions companies are asking right now.
Design file-embedding storage system
System Design: Multimodal Embedding Service for User Uploads Context You are designing a backend service that, for each user-uploaded asset, generates...
Design an LLM-based binary classifier
Design a Binary Text Classifier Using Only a Log-Probability Scoring Helper Context You are building a binary text classifier without fine-tuning. You...
Design a Retrieval-Augmented Generation (RAG) system
Prompt Design a Retrieval-Augmented Generation (RAG) system that answers user questions using an organization’s internal documents (PDFs, wiki pages, ...
Design pipeline using classification and embedding services
You are given two black-box ML services: 1. Classification Service - Input: One or more text documents. - Output: A label for each document (e.g...
Design system to detect privacy-leak records
You are given a very large database that contains user data (both structured fields and unstructured text such as logs, messages, and documents). The ...
Design an enterprise RAG assistant for internal docs
Scenario Design an enterprise GPT-style assistant that allows employees to ask questions about internal company documents (policies, wikis, specs, tic...
Estimate VRAM and compare model parallelism
You are reasoning about GPU memory and parallelism for a transformer-like workload dominated by matrix multiplications. Part 1: Can one matmul’s tenso...
Design comment-likelihood prediction platform
Scenario You’re building an ML platform component that serves a model to predict the likelihood that a user will comment on a given post. The intervie...
Design NL-to-Formula assistant for Airtable
Scenario You are given: - An Airtable API key and a link/base/table you can read/write. - An LLM API key (e.g., Claude) that you can call. Users type ...
Design a personalized recommendation system
System Design: Personalized Recommendations for a Consumer App Context Assume you are building the home-feed recommendations for a large consumer app ...
Design scalable, highly available GenAI serving
System Design: Highly Scalable, Highly Available Generative AI Inference Platform Context Design a production-grade deployment for a generative AI tex...
Explain ML compilation optimizations and hardware fit
ML Compiler Optimizations and Platform Targeting Context You are designing a compiler/runtime stack for deep learning workloads that must run efficien...
Build and design a Mistral RAG agent
Design and Implement a Minimal LLM-Powered RAG Agent (Python, Mistral API) Context You are asked to build a minimal, but production-minded, retrieval-...
Design a low-latency ML inference API
System Design: Low‑Latency ML Inference API (Real‑Time) Context You are designing an in‑region, synchronous inference API used by product surfaces (e....
Design a Static Audio Detection System
System Design: Static Audio Detection Pipeline Context Design an offline (non-live) audio detection system that processes static audio files (e.g., us...
Design an AI chatbot with browser storage
System Design: Browser-Only Chatbot With Streaming and No Server-Side Conversation Storage Context Design an AI chatbot where all user messages and co...
Design a multimodal embedding service
System Design: Multimodal Embedding Pipeline for Documents, Images, and Videos You are designing a production service that computes embeddings for use...
Review an inference API design for scale
System Design Review: Machine-Learning Inference API (Distributed Systems Focus) Background You are reviewing a teammate’s design document for a produ...
Design an ML inference orchestration platform
System Design: ML Inference Orchestration Platform Context You are designing a multi-tenant platform that exposes several ML models as independent ser...
Build a Mistral-powered RAG agent
Build a Minimal RAG Tool Using the Mistral API Context You have an API token and need to implement a small retrieval-augmented generation (RAG) tool i...