Tiktok Data Scientist Machine Learning Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Compare Random Forests and Boosted Trees: Bias, Variance, Speed
Scenario A product/data science team is deciding between Random Forests and Gradient-Boosted Decision Trees (e.g., XGBoost) for a new predictive task....
Predict Customer Churn with Machine Learning Workflow
Predicting Monthly Churn: End-to-End Workflow Scenario A subscription platform wants to predict whether a customer will churn in the next month. Assum...
Design Real-Time Credit Card Fraud Detection System
Real-Time Credit-Card Fraud Detection System Design Scenario You are designing a real-time fraud detection system for an online payments platform that...
Design an ad-selection system across objectives
End-to-End Ad-Selection System Design Context You must choose, at impression time, which advertiser type to show to a user. There are three advertiser...
Detect and suppress bad sellers robustly
System Design: Identify and Suppress Bad Sellers in a Commerce Marketplace Context You are designing an ML-driven risk system for a large-scale market...
Explain and tune XGBoost; prevent overfitting
XGBoost Tree Booster: Objective, Hyperparameters, Tuning for Imbalanced Detection, and Post-training Use Context: You are building a binary classifier...
Explain SHAP vs VIF under collinearity
High Collinearity in Binary Classification: VIF, SHAP, and Interpretation Strategy You are modeling a binary outcome Y. Two numeric features A and B a...
Choose linear regression or decision tree appropriately
Choose Between Linear Regression and a Decision Tree Under a Hinge and Interaction DGP Context You have 100,000 i.i.d. observations with features x1 (...
Contrast LSTM and Transformer for long sequences
Train a Long-Context Autoregressive LM (T = 8192, H = 512, B = 8) You are training an autoregressive language model with: - Sequence length T = 8192 t...
Compare bagging vs boosting on imbalanced data
Fraud Detection on 10M Time-Ordered Transactions (0.5% Fraud) You are building a binary classifier to detect 0.5% fraudulent events among 10,000,000 t...
Estimate heterogeneous treatment effects with causal ML
Context You are given large-scale, logged observational data from an always-on promotion. Each record contains features X (user/context), a binary tre...
Predict User Churn with Effective Modeling Techniques
Predicting User Churn for a Subscription App Context You are building a model to predict which active subscribers are likely to churn soon so the team...
Personalize Ad Delivery Using Machine Learning Techniques
Personalized Delivery of Three Ad Categories Scenario You operate a consumer feed with a single ad opportunity per request and three possible ad categ...
Choose Between Random Forests and Gradient Boosting Models
Scenario Product-facing data science interview on choosing and configuring tree-based ensemble models for tabular prediction in a production setting. ...