Data Scientist Machine Learning Interview Questions
Practice the exact questions companies are asking right now.

"10 years of experience but never worked at a top company. PracHub's senior-level questions helped me break into FAANG at 35. Age is just a number."

"I was skeptical about the 'real questions' claim, so I put it to the test. I searched for the exact question I got grilled on at my last Meta onsite... and it was right there. Word for word."

"Got a Google recruiter call on Monday, interview on Friday. Crammed PracHub for 4 days. Passed every round. This platform is a miracle worker."

"I've used LC, Glassdoor, and random Discords. Nothing comes close to the accuracy here. The questions are actually current — that's what got me. Felt like I had a cheat sheet during the interview."

"The solution quality is insane. It covers approach, edge cases, time complexity, follow-ups. Nothing else comes close."

"Legit the only resource you need. TC went from 180k -> 350k. Just memorize the top 50 for your target company and you're golden."

"PracHub Premium for one month cost me the price of two coffees a week. It landed me a $280K+ starting offer."

"Literally just signed a $600k offer. I only had 2 weeks to prep, so I focused entirely on the company-tagged lists here. If you're targeting L5+, don't overthink it."

"Coaches and bootcamp prep courses cost around $200-300 but PracHub Premium is actually less than a Netflix subscription. And it landed me a $178K offer."

"I honestly don't know how you guys gather so many real interview questions. It's almost scary. I walked into my Amazon loop and recognized 3 out of 4 problems from your database."

"Discovered PracHub 10 days before my interview. By day 5, I stopped being nervous. By interview day, I was actually excited to show what I knew."
"The search is what sold me. I typed in a really niche DP problem I got asked last year and it actually came up, full breakdown and everything. These guys are clearly updating it constantly."
Debug and fix a PyTorch Transformer training loop
Minimal Causal LM Debugging and Optimization Context You are given a tiny causal decoder-only language model implemented in PyTorch. It appears to "tr...
How predict vehicles’ turn direction at intersection?
At an intersection, there are N vehicles stopped or moving slowly. For each vehicle you have historical time-series data up to the current time: - Pos...
Explain KNN and how to tune it
K-Nearest Neighbors (KNN) fundamentals You are interviewing for a Data Scientist role. 1. Explain how the KNN algorithm works for both classification ...
Compare two rare-event detection models statistically
You are evaluating two models (Model A and Model B) for rare-event detection (e.g., fraud, abuse, medical adverse event). Positives are extremely rare...
Design features for house price prediction
Scenario You are building a model to predict house sale price from a tabular dataset (similar to typical real-estate datasets). The interviewer expect...
Compute and plot a precision–recall curve
Precision–Recall (PR) curve coding / evaluation You are given a binary classifier’s outputs on a dataset: - y_true: array of true labels in \(\{0,1\}\...
Design a lead-scoring model
Context You are interviewing for a Data Scientist role on a marketing/growth team. The business wants lead scoring: ranking or scoring incoming leads ...
Derive correlation bounds and omitted-variable bias
Core Statistics Prompt Answer the following related statistics questions. Part A — Pairwise correlation constraints Let \(X, Y, Z\) be random variable...
Compare Random Forests and Boosted Trees: Bias, Variance, Speed
Scenario A product/data science team is deciding between Random Forests and Gradient-Boosted Decision Trees (e.g., XGBoost) for a new predictive task....
Explain project details, PCA, and SHAP
Interview prompt (ML project deep dive) You are interviewing for a Data Scientist role. The interviewer asks you to pick one ML project you have perso...
Forecast bikes available at a station
Data Analysis / Forecasting Prompt You are given historical Citi Bike (bike-share) trip and station status data. Each station has a fixed dock capacit...
Build a regularized regression pipeline
Technical Screen: End‑to‑End Signup Prediction with scikit‑learn Context You are given a cleaned tabular dataset with marketing and product metrics. Y...
Implement K-means and handle train-inference mismatch
Part A — K-means (implementation + concepts) You are given a dataset \(X \in \mathbb{R}^{n \times d}\) and an integer \(k\). 1. Explain K-means: what ...
Explain core probability and ML statistics concepts
Answer the following short theory questions (you may use equations and brief examples): Probability 1. You roll two fair six-sided dice. - What is ...
Design and diagnose a regression pipeline
CLV_90 Prediction Pipeline under Zero-Inflation, Heavy Tails, and Multicollinearity Context You need to predict 90-day customer value (CLV_90) at the ...
Explain L1 vs L2 and ridge vs lasso
Explain the differences between: 1. L1 vs L2 regularization (how they change the objective, geometry/intuitions, and typical effects on learned parame...
Diagnose location-sorted recommender causing revenue drop
Eats recommendations were changed to rank items primarily by distance to the user; after launch, add-to-cart rate rose but revenue per session fell. D...
Design a robust fraud detection system
Real-Time Card Fraud Detector — End-to-End Design Context - Fraud base rate ≈ 0.2% (severe class imbalance) - Labels arrive with a 14-day delay (e.g.,...
Handle p≈n linear regression with L1
You must fit linear regression with p = 500 predictors and n = 600 observations. What failure modes do you expect and why does OLS overfit when p is c...
Design a house-price prediction model
Problem You are asked to build a model to predict house sale prices for a city of your choice. Data (assume typical real-estate fields) You have a his...