Machine Learning Interview Questions
Practice the exact questions companies are asking right now.
How predict vehicles’ turn direction at intersection?
At an intersection, there are N vehicles stopped or moving slowly. For each vehicle you have historical time-series data up to the current time: - Pos...
Debug and fix a PyTorch Transformer training loop
Minimal Causal LM Debugging and Optimization Context You are given a tiny causal decoder-only language model implemented in PyTorch. It appears to "tr...
Explain KNN and how to tune it
K-Nearest Neighbors (KNN) fundamentals You are interviewing for a Data Scientist role. 1. Explain how the KNN algorithm works for both classification ...
Design features for house price prediction
Scenario You are building a model to predict house sale price from a tabular dataset (similar to typical real-estate datasets). The interviewer expect...
Design a lead-scoring model
Context You are interviewing for a Data Scientist role on a marketing/growth team. The business wants lead scoring: ranking or scoring incoming leads ...
Compare two rare-event detection models statistically
You are evaluating two models (Model A and Model B) for rare-event detection (e.g., fraud, abuse, medical adverse event). Positives are extremely rare...
Explain leakage, missing data, and common losses
Answer the following traditional ML questions: 1. Data leakage - What is data leakage? - Give 2–3 common examples. - How do you prevent or fi...
Compute and plot a precision–recall curve
Precision–Recall (PR) curve coding / evaluation You are given a binary classifier’s outputs on a dataset: - y_true: array of true labels in \(\{0,1\}\...
Debug transformer and train classifier
Debug and Fix a Transformer Text Classifier, Then Train and Evaluate It Context You inherit a small codebase for a transformer-based text classifier. ...
Implement and Debug Backprop in NumPy
Two-Layer Neural Network: Backpropagation and Gradient Check (NumPy) Context You are implementing a fully connected two-layer neural network for multi...
Train a classifier and analyze dataset
End-to-End Binary Classifier Workflow (EDA → Modeling → Fairness → Report) You are given a labeled tabular dataset and asked to implement a reproducib...
Debug a transformer training pipeline
Diagnose a Diverging PyTorch Transformer Training Run You are given a PyTorch Transformer training pipeline whose loss diverges and validation accurac...
Derive correlation bounds and omitted-variable bias
Core Statistics Prompt Answer the following related statistics questions. Part A — Pairwise correlation constraints Let \(X, Y, Z\) be random variable...
Explain bias–variance, overfitting, and vanishing gradients
Answer the following ML fundamentals questions: 1. Bias–variance tradeoff: What are bias and variance? How do they relate to underfitting/overfitting?...
Explain project details, PCA, and SHAP
Interview prompt (ML project deep dive) You are interviewing for a Data Scientist role. The interviewer asks you to pick one ML project you have perso...
Compare preference alignment methods for LLMs
Question You’re asked to discuss preference alignment approaches for large language models. Task Compare several alignment methods and explain when yo...
Diagnose Transformer training and inference bugs
Debugging a Transformer That Intermittently Throws Shape/Type Errors and Fails to Converge You are given a Transformer-based sequence model that: - In...
Explain core ML fundamentals and tradeoffs
ML Fundamentals Interview Prompt Answer the following ML fundamentals questions clearly and with practical examples: 1. Bias vs. variance - What ar...
Compare Random Forests and Boosted Trees: Bias, Variance, Speed
Scenario A product/data science team is deciding between Random Forests and Gradient-Boosted Decision Trees (e.g., XGBoost) for a new predictive task....
Explain FlashAttention, KV cache, and RoPE
You are interviewing for an LLM-focused role. 1. FlashAttention - Explain what problem it solves in transformer attention. - Describe the high-l...