Machine Learning Engineer Interview Questions
Practice the exact questions companies are asking right now.
Explain KV cache in Transformer inference
Question In Transformer-based language model inference, what is a key-value (KV) cache? Explain: - What gets cached (tensors, shapes at a high level) ...
Design a RAG system with evaluation
Scenario You are asked to design a Retrieval-Augmented Generation (RAG) system that answers user questions using a private corpus (e.g., internal docs...
Compute time to infect all cells
You are given an n × m grid representing people in a city. - Each cell is either infected (1) or healthy (0). - Two cells are neighbors if they share ...
How would you build an image classifier with dirty data?
Scenario You are asked to build an image classification model (single-label, multi-class) for a product team. The image dataset is known to be dirty (...
Calibrate LLM output to match Word formatting
Scenario You’re building an LLM-powered feature in a word processor (e.g., Microsoft Word) that generates content users can insert directly into a doc...
Compute array products excluding self and top-k
Algorithms 1) Product of array except self (no division) Given an integer array nums of length n, return an array ans where: - ans[i] = product of all...
Implement 2D convolution forward pass
Problem Implement the forward pass of a 2D convolution (conv2d) from scratch (no deep learning libraries). You are given: - Input tensor x with shape ...
Compare preference alignment methods for LLMs
Question You’re asked to discuss preference alignment approaches for large language models. Task Compare several alignment methods and explain when yo...
Optimize vector semantic search for an assistant
Scenario You own the vector semantic search layer for an AI assistant (e.g., Copilot). Users query across enterprise documents and/or product knowledg...
Design a search query autocomplete system
Question Design a search autocomplete system that suggests completions as the user types. Requirements - Sub-100ms latency per keystroke. - Suggestion...
Debug online worse than offline model performance
Production ML: online performance worse than offline You launch an ML model. Offline evaluation (validation/test) looked good, but after deployment th...
Design large-scale near-duplicate video detection
Design a product-grade fuzzy (near-)duplicate detection system for a large short-video platform. You need to detect whether an uploaded video is a nea...
Design a robot movement command system
Robot Movement (Pair Programming) You are given an empty starter repository (only a README). Implement a small, testable robot movement module that ca...
Explain Transformers and deploy an LLM safely
Answer the following LLM-focused questions. 1) Transformer basics - What problem does the Transformer architecture solve compared with RNNs? - Explain...
Debug transformer and train classifier
Debug and Fix a Transformer Text Classifier, Then Train and Evaluate It Context You inherit a small codebase for a transformer-based text classifier. ...
Implement PyTorch training loop
Implement a basic PyTorch training loop You are given a PyTorch neural network model, a DataLoader that yields (inputs, targets) batches, an optimizer...
Design a search relevance prediction approach
Search relevance prediction You are asked to predict relevance for an e-commerce search engine (given a user query and a product/document). Prompt 1. ...
Convert stack samples to execution trace
You are given sampling-profiler output: a list of Sample objects ordered by timestamp ascending. Each Sample has (t: float, stack: list[str]) where st...
Train a classifier and analyze dataset
End-to-End Binary Classifier Workflow (EDA → Modeling → Fairness → Report) You are given a labeled tabular dataset and asked to implement a reproducib...
Implement string-based rounding without floats
Coding You are not allowed to parse the input into a built-in floating type (to avoid overflow and precision issues). Work directly on strings. 1) Imp...