ML System Design Interview Questions
Practice the exact questions companies are asking right now.
Design a RAG system with evaluation
Scenario You are asked to design a Retrieval-Augmented Generation (RAG) system that answers user questions using a private corpus (e.g., internal docs...
Design a fraud detection system
Scenario You are designing an end-to-end fraud detection system for an online platform (e.g., e-commerce marketplace, payments, account signup, or ad ...
Design Jira bug-to-team classification system
Problem Design a system that automatically classifies incoming Jira bug tickets into the most appropriate owning team, and produces a report for custo...
How would you build an image classifier with dirty data?
Scenario You are asked to build an image classification model (single-label, multi-class) for a product team. The image dataset is known to be dirty (...
Design an ads ranking system with calibration
ML System Design: Ads Ranking (e-commerce) Design an online ads ranking (ad “re-ranking”) system for an e-commerce app. The system receives a request ...
Design a real-time home feed ranker
Scenario Design a real-time home feed (e.g., social or content platform) that is responsive to user engagement. Users open the app and see a ranked li...
Design pipeline using classification and embedding services
You are given two black-box ML services: 1. Classification Service - Input: One or more text documents. - Output: A label for each document (e.g...
Calibrate LLM output to match Word formatting
Scenario You’re building an LLM-powered feature in a word processor (e.g., Microsoft Word) that generates content users can insert directly into a doc...
Design a computer-use agent end-to-end
Scenario You are designing a computer-use agent that can complete user tasks on a standard desktop environment by observing the screen and issuing act...
What skills are needed for AI infra roles?
You interviewed for an AI infrastructure / LLM serving internship role and were told the rejection reason was insufficient familiarity with vLLM, incl...
Estimate VRAM and compare model parallelism
You are reasoning about GPU memory and parallelism for a transformer-like workload dominated by matrix multiplications. Part 1: Can one matmul’s tenso...
Design an unsafe content detection system
Scenario You are building a system that detects and mitigates unsafe user-generated content (UGC) on a large platform. Unsafe content can include: hat...
Design NL-to-Formula assistant for Airtable
Scenario You are given: - An Airtable API key and a link/base/table you can read/write. - An LLM API key (e.g., Claude) that you can call. Users type ...
Design a low-latency ML inference API
System Design: Low‑Latency ML Inference API (Real‑Time) Context You are designing an in‑region, synchronous inference API used by product surfaces (e....
Infer user intent from typing in real time
Scenario You’re building an AI feature that observes a user’s typing stream in an editor/search box and predicts the user’s intent in real time. This ...
Optimize vector semantic search for an assistant
Scenario You own the vector semantic search layer for an AI assistant (e.g., Copilot). Users query across enterprise documents and/or product knowledg...
Design a game recommendation modeling approach
Scenario You are building a personalized game recommender for a consumer app/store. The goal is to recommend a ranked list of games to each user to in...
Build and design a Mistral RAG agent
Design and Implement a Minimal LLM-Powered RAG Agent (Python, Mistral API) Context You are asked to build a minimal, but production-minded, retrieval-...
Design an image/video near-duplicate detection system
Question Design a system to detect near-duplicate images/videos (e.g., reuploads, minor edits, different encodes) at large scale. Requirements - Suppo...
Debug online worse than offline model performance
Production ML: online performance worse than offline You launch an ML model. Offline evaluation (validation/test) looked good, but after deployment th...