Tiktok Machine Learning Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Compare Random Forests and Boosted Trees: Bias, Variance, Speed
Scenario A product/data science team is deciding between Random Forests and Gradient-Boosted Decision Trees (e.g., XGBoost) for a new predictive task....
Predict Customer Churn with Machine Learning Workflow
Predicting Monthly Churn: End-to-End Workflow Scenario A subscription platform wants to predict whether a customer will churn in the next month. Assum...
Design Real-Time Credit Card Fraud Detection System
Real-Time Credit-Card Fraud Detection System Design Scenario You are designing a real-time fraud detection system for an online payments platform that...
Design an ad-selection system across objectives
End-to-End Ad-Selection System Design Context You must choose, at impression time, which advertiser type to show to a user. There are three advertiser...
Detect and suppress bad sellers robustly
System Design: Identify and Suppress Bad Sellers in a Commerce Marketplace Context You are designing an ML-driven risk system for a large-scale market...
Explain and tune XGBoost; prevent overfitting
XGBoost Tree Booster: Objective, Hyperparameters, Tuning for Imbalanced Detection, and Post-training Use Context: You are building a binary classifier...
Explain SHAP vs VIF under collinearity
High Collinearity in Binary Classification: VIF, SHAP, and Interpretation Strategy You are modeling a binary outcome Y. Two numeric features A and B a...
Choose linear regression or decision tree appropriately
Choose Between Linear Regression and a Decision Tree Under a Hinge and Interaction DGP Context You have 100,000 i.i.d. observations with features x1 (...
Contrast LSTM and Transformer for long sequences
Train a Long-Context Autoregressive LM (T = 8192, H = 512, B = 8) You are training an autoregressive language model with: - Sequence length T = 8192 t...
Compare bagging vs boosting on imbalanced data
Fraud Detection on 10M Time-Ordered Transactions (0.5% Fraud) You are building a binary classifier to detect 0.5% fraudulent events among 10,000,000 t...
Estimate heterogeneous treatment effects with causal ML
Context You are given large-scale, logged observational data from an always-on promotion. Each record contains features X (user/context), a binary tre...
Implement attention and nucleus sampling; compare to top-k
Implement Multi‑Head Attention and Nucleus (Top‑p) Sampling Context You are building core components used in Transformer-based language models. Implem...
Explain overfitting, imbalance, undersampling, and attention heads
Context You are designing and evaluating production machine learning models, with emphasis on classification, reliability, and efficient architectures...
Personalize Ad Delivery Using Machine Learning Techniques
Personalized Delivery of Three Ad Categories Scenario You operate a consumer feed with a single ad opportunity per request and three possible ad categ...
Choose Between Random Forests and Gradient Boosting Models
Scenario Product-facing data science interview on choosing and configuring tree-based ensemble models for tabular prediction in a production setting. ...
Predict User Churn with Effective Modeling Techniques
Predicting User Churn for a Subscription App Context You are building a model to predict which active subscribers are likely to churn soon so the team...