Data Scientist Machine Learning Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Compare Random Forests and Boosted Trees: Bias, Variance, Speed
Scenario A product/data science team is deciding between Random Forests and Gradient-Boosted Decision Trees (e.g., XGBoost) for a new predictive task....
Build Model to Predict Customer Contract Renewal
Predicting Enterprise Customer Renewal for Google Meet You are tasked with designing a model to predict whether an enterprise customer will renew thei...
Build Predictive Model for Product Metric: Steps Explained
Scenario You are interviewing for a Data Scientist role and are asked to design a predictive model for a key product metric in a consumer app (e.g., p...
Determine Features for Effective Hashtag Recommendations
Hashtag Recommendation System Design Context You are designing a hashtag recommendation system for a social-media platform. Given a user u composing a...
How to Analyze and Model Behavioral Data Effectively?
End-to-End Conversion Modeling on a Raw Behavioral Dataset Scenario You receive a raw, event-level behavioral dataset (e.g., user actions, sessions, m...
Identify Unsupervised Techniques for Detecting Fraudulent Transactions
Unsupervised Fraud Detection: Modeling and Evaluation Without Labels Scenario You receive millions of historical transactions with no fraud labels. Ma...
Develop a Restaurant-Recommendation Engine with Logistic Regression
Restaurant Recommendation Engine: Metrics, Features, Model, and Evaluation Scenario You are designing a restaurant recommendation engine for a social ...
Identify Fake Accounts Using Machine Learning Techniques
Scenario You are a data scientist at a social‑commerce platform responsible for trust and safety. You need to design a system to detect and mitigate f...
Compare Logistic Regression and Random Forest in Limited Data Scenarios
Model Selection for Binary Classification with Limited Data and Potential Non-Linearities Scenario You are designing a binary classifier with limited ...
Optimize Surge Notifications for Rideshare Drivers
Scenario A rideshare marketplace experiences airport demand spikes. When demand exceeds supply, the system can send surge-pricing push notifications t...
Optimize Email Strategy for New Prime Video Series Launch
Scenario Designing, deploying, and evaluating ranking models and marketing emails for Prime Video. Question How would you approach sending marketing e...
Engineer Features to Enhance Smartphone Battery Life Prediction
Battery Life Prediction with Sparse History Problem You are given sparse discharge traces that record battery percentage over elapsed time for prior u...
Optimize Churn Prediction: Feature Engineering and Model Selection
Weekly Churn Prediction (10M users): Feature Engineering, Model Choice, Explainability, and Debugging Scenario You own a weekly churn-prediction pipel...
Design a Regression Model for Robust Extrapolation Performance
Scenario Onsite machine-learning exercise: your task is to build a regression model using only numerical features that not only fits training data but...
Design an ML Model for Interview Recommendation Pipeline
Scenario You are designing and deploying an ML model that mirrors a real-world recommendation pipeline serving a large product catalog with strict lat...
How to Architect a Personalized Ads Serving System
Full-Funnel Ads Serving System Design Scenario You are asked to architect a full-funnel advertising platform that serves personalized ads to users on ...
Evaluate and Experiment with Harmful Content Detection Model
Evaluating a Harmful-Content Detection Model: Offline and Online Context You are given a binary classification model that detects harmful content in a...
Evaluate Ensemble Models for Bias-Variance, Speed, and Interpretability
Large-Scale Recommendation System: Ensembles, Overfitting, Metrics, Architectures, and Optimization Context You are designing a large-scale recommenda...
Classify Reviewers Using Bayesian Probability for Accuracy Analysis
Scenario Classifying reviewers as lazy or careful with limited labels Context (completed) You are auditing a pool of reviewers who can be either: - La...
Design Framework for Robust House-Price Prediction Model
Model Robustness, Diagnostics, Random Forests, and Large-Scale Regression Context You are building and evaluating a supervised model to predict reside...