Amazon Data Scientist Machine Learning Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Optimize Email Strategy for New Prime Video Series Launch
Scenario Designing, deploying, and evaluating ranking models and marketing emails for Prime Video. Question How would you approach sending marketing e...
Design an ML Model for Interview Recommendation Pipeline
Scenario You are designing and deploying an ML model that mirrors a real-world recommendation pipeline serving a large product catalog with strict lat...
Evaluate Ensemble Models for Bias-Variance, Speed, and Interpretability
Large-Scale Recommendation System: Ensembles, Overfitting, Metrics, Architectures, and Optimization Context You are designing a large-scale recommenda...
Explain Decision-Tree Training and Clustering Algorithms
Decision Trees and Clustering: Training Mechanics and Core Principles Context Technical/phone screen for an Applied Scientist/Data Scientist role, ass...
Choose Models for Imbalanced Data and Time-Series Forecasting
Scenario You must choose and tune models for (a) forecasting marketplace demand with seasonality and trend, and (b) detecting fraud where the positive...
Design an Automated Home-Price Valuation Model
Scenario You are building an automated house-price valuation service for a real-estate platform. Question Design a home-price estimation system. Walk ...
Diagnose Bias–Variance Trade-off in Supervised Learning
Supervised Learning Review (Customer-Facing Ranking Context) You are designing and evaluating models for a customer-facing ranking service (e.g., orde...
Handle Missing Values and Choose ML Algorithms Wisely
ML Interview: Core Modeling Concepts Context: Technical phone screen for a Data Scientist role. Assume primarily tabular datasets; address both classi...
Compare RNNs and Transformers for Long-Sequence Text Classification
Scenario You are designing a long-sequence text classification system under tight inference latency constraints (e.g., large documents or logs that mu...
Optimize XGBoost for Predicting Marketing Outcomes
Gradient-Boosted Trees for Marketing Outcome Prediction Context You’re building a model to predict a marketing outcome (e.g., likelihood of conversion...
Design a Machine Learning Recommendation System Pipeline
System Design: End-to-End ML Recommendation System Scenario You are building an end-to-end machine-learning-powered recommendation system for a large ...
Optimize Feature Selection and Handling in Machine Learning Models
Scenario You are building a customer propensity model to predict the probability that a user will take a desired action (e.g., purchase, subscribe). Y...
Optimize Predictive Analytics: Feature Engineering to Model Evaluation
End-to-End Predictive Analytics Project Walkthrough Context You are interviewing for a Data Scientist role. The interviewer asks you to walk through a...
Compare Regularization Techniques and Their Use Cases
Technical Phone Screen: Model Evaluation, Regularization, and Regression Basics Instructions Answer the following, focusing on clarity and practical i...
Build Accurate Energy Consumption Prediction Model for Utilities
Predicting Daily Energy Consumption: End-to-End Regression to Production Context You need to build and productionize a supervised regression model tha...
Implement Batch Gradient Descent for Linear Regression
Batch Gradient Descent for Linear Regression (MSE) Scenario You are building a linear regression model from scratch and will optimize its parameters u...
Design a robust traffic forecasting pipeline
Forecasting Daily Amazon Retail Traffic: End-to-End Design You are given 5 years of daily Amazon retail site traffic counts. Design an end-to-end fore...
Explain random forests, bagging, and evaluation
Random Forests, Bagging vs Boosting, and Practical Model Validation You are building a supervised learning model on tabular data. Explain and compare ...
Diagnose and fix underperforming ML model
Rapidly Improving Recall Under Class Imbalance (One-Day Plan) Context You inherit a binary fraud detection model with severe class imbalance (positive...
Optimize precision–recall under class imbalance
You have extreme class imbalance (positive rate ~1%). You score 12 examples as follows (id, true_label, score): A,1,0.92; B,0,0.90; C,0,0.88; D,0,0.70...