Other Data Scientist Machine Learning Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Derive and regularize logistic regression
Churn Propensity with Logistic Regression: Theory, Validation, and Decisions Context: You are building a churn propensity model (y ∈ {0,1}) using logi...
Design anomaly detection and handle imbalanced logistic regression
You receive a time‑stamped transactions dataset: columns [event_time (UTC), customer_id, merchant_id, amount, country, device_type, features...], labe...
Extract companies from noisy text
Extracting Company Names from Noisy Resumes and Web Snippets Context You receive messy resume text (PDF-to-text/OCR, varying casing) and scraped web s...
Evaluate and select K in K-means
K-means Clustering: Concepts, Initialization, Model Selection, Preprocessing, and Business Validation Context: You are clustering customer data with n...
Explain SVM kernels and complexity
Support Vector Machines – Core Concepts and Practice You are interviewing for a Data Scientist role. Answer the following about Support Vector Machine...
Compare trees, RF, and gradient boosting
Decision Trees, Random Forests, and Gradient-Boosted Trees You are interviewing for a Data Scientist role and are asked to compare common tree-based m...
Contrast L1 and L2 regularization effects
Ridge (L2) vs Lasso (L1) in Linear and Logistic Regression Context: You are comparing L2 (Ridge) and L1 (Lasso) regularization for linear and logistic...
Tune metrics for imbalanced classification
Fraud Detection With Rare Positives (0.5%) and Messy Data You are designing a supervised transaction-level fraud detector. Positives (fraud) are rare ...
Predict job changes month by month
Predict Monthly Job-Change Risk (Discrete-Time Survival Setup) Context You are building a monthly model to predict the probability that a LinkedIn mem...
Detect clickbait without labels, then supervise
Detecting Clickbait Ads Without Labeled Data Context You are asked to detect clickbait ad creatives when there is no labeled training data. You have i...
Design a hybrid marketplace fraud system
Design a Fraud Detection System for a Marketplace and Profile Credentials Context You are a data scientist at a two‑sided marketplace where users can ...
Explain OS usage gap via trees
iOS vs. Android Usage Gap: Modeling, Causality, Telemetry, Missing Data, and Segmented Actions Context You observe that Instagram usage is substantial...