ML System Design Interview Questions
Practice the exact questions companies are asking right now.
Design a RAG system with evaluation
Scenario You are asked to design a Retrieval-Augmented Generation (RAG) system that answers user questions using a private corpus (e.g., internal docs...
How would you build an image classifier with dirty data?
Scenario You are asked to build an image classification model (single-label, multi-class) for a product team. The image dataset is known to be dirty (...
What skills are needed for AI infra roles?
You interviewed for an AI infrastructure / LLM serving internship role and were told the rejection reason was insufficient familiarity with vLLM, incl...
Calibrate LLM output to match Word formatting
Scenario You’re building an LLM-powered feature in a word processor (e.g., Microsoft Word) that generates content users can insert directly into a doc...
Optimize vector semantic search for an assistant
Scenario You own the vector semantic search layer for an AI assistant (e.g., Copilot). Users query across enterprise documents and/or product knowledg...
Debug online worse than offline model performance
Production ML: online performance worse than offline You launch an ML model. Offline evaluation (validation/test) looked good, but after deployment th...
Explain Transformers and deploy an LLM safely
Answer the following LLM-focused questions. 1) Transformer basics - What problem does the Transformer architecture solve compared with RNNs? - Explain...
Design a RAG-based assistant service
Scenario You need to build a Retrieval-Augmented Generation (RAG) assistant for an enterprise product. It should answer questions using internal docum...
Infer user intent from typing in real time
Scenario You’re building an AI feature that observes a user’s typing stream in an editor/search box and predicts the user’s intent in real time. This ...
Design a low-latency ML inference API
System Design: Low‑Latency ML Inference API (Real‑Time) Context You are designing an in‑region, synchronous inference API used by product surfaces (e....
Design a chatbot fallback for unknown questions
Scenario You run a ChatGPT-like assistant. Users sometimes ask questions the model cannot answer reliably (unknown/uncertain/needs up-to-date facts). ...
Design pipeline using classification and embedding services
You are given two black-box ML services: 1. Classification Service - Input: One or more text documents. - Output: A label for each document (e.g...
Debug MNIST denoiser training
Debugging a Colab Denoising Network on MNIST Goal: Make a Colab notebook that trains a denoising neural network on MNIST such that: - (a) the training...
Design NL-to-Formula assistant for Airtable
Scenario You are given: - An Airtable API key and a link/base/table you can read/write. - An LLM API key (e.g., Claude) that you can call. Users type ...
Design a batch inference API
System Design: Async Inference Service API (POST Job, Poll for Results) Context You are designing an asynchronous inference service where clients subm...
Design an image/video near-duplicate detection system
Question Design a system to detect near-duplicate images/videos (e.g., reuploads, minor edits, different encodes) at large scale. Requirements - Suppo...
Build and design a Mistral RAG agent
Design and Implement a Minimal LLM-Powered RAG Agent (Python, Mistral API) Context You are asked to build a minimal, but production-minded, retrieval-...
Design an LLM-based binary classifier
Design a Binary Text Classifier Using Only a Log-Probability Scoring Helper Context You are building a binary text classifier without fine-tuning. You...
Estimate VRAM and compare model parallelism
You are reasoning about GPU memory and parallelism for a transformer-like workload dominated by matrix multiplications. Part 1: Can one matmul’s tenso...
Implement a trie-based tokenizer
Design and Implement a Trie-Based Subword Tokenizer for LLM Pretraining Context You are building a subword tokenizer for a large-scale LLM pretraining...