PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/TikTok

Explain and tune XGBoost; prevent overfitting

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of XGBoost tree-boosting internals, hyperparameter impacts on bias/variance/training time, strategies for imbalanced binary classification and leakage-aware validation, post-training calibration, and interpretability for tabular fraud detection.

  • hard
  • TikTok
  • Machine Learning
  • Data Scientist

Explain and tune XGBoost; prevent overfitting

Company: TikTok

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Explain XGBoost's tree booster in enough detail to answer: (a) What objective does it optimize and how does the second-order Taylor approximation lead to a split "gain" formula? Explain the roles of lambda (L2), alpha (L1), gamma (min_split_loss), and learning rate (eta) in that gain and in pruning. (b) List the most impactful hyperparameters for tabular classification and describe, for each, the expected direction of effect on bias/variance and training time: max_depth, max_leaves, min_child_weight, subsample, colsample_bytree/level, eta, n_estimators, lambda, alpha, gamma, max_delta_step, scale_pos_weight, monotone_constraints. (c) You must train a model to flag "bad sellers" when the positive rate is 0.5%. Design a tuning plan to minimize real business cost: specify data split strategy that avoids leakage (e.g., time- and seller-based splits), the primary offline metric (e.g., PR-AUC), how to choose an operating threshold using a cost matrix, how to apply early stopping robustly, and what diagnostics/plots you would produce to detect overfitting and data leakage. (d) After training, how would you calibrate probabilities and interpret the model for investigators (e.g., SHAP), while preventing attackers from reverse-engineering the rules?

Quick Answer: This question evaluates understanding of XGBoost tree-boosting internals, hyperparameter impacts on bias/variance/training time, strategies for imbalanced binary classification and leakage-aware validation, post-training calibration, and interpretability for tabular fraud detection.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
TikTok logo
TikTok
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
2
0

XGBoost Tree Booster: Objective, Hyperparameters, Tuning for Imbalanced Detection, and Post-training Use

Context: You are building a binary classifier with XGBoost (tree booster) to flag “bad sellers.” Positives are rare (≈0.5%). Answer the following:

(a) Objective, Second-Order Approximation, Split Gain, and Regularization Roles

  • What objective does XGBoost optimize for tree boosting?
  • How does the second-order Taylor approximation lead to the split "gain" formula?
  • Explain the roles of lambda (L2), alpha (L1), gamma (min_split_loss), and learning rate (eta) in split selection and pruning.

(b) Impactful Hyperparameters for Tabular Classification

For each hyperparameter, describe expected direction of effect on bias, variance, and training time:

  • max_depth, max_leaves, min_child_weight, subsample, colsample_bytree/level, eta, n_estimators, lambda, alpha, gamma, max_delta_step, scale_pos_weight, monotone_constraints.

(c) Tuning Plan for 0.5% Positive Rate (“Bad Sellers”)

  • Specify a data split strategy that avoids leakage (time- and seller-based splits).
  • Choose the primary offline metric and justify (e.g., PR-AUC).
  • Show how to set an operating threshold using a cost matrix.
  • Describe robust early stopping.
  • List diagnostics/plots to detect overfitting and data leakage.

(d) Post-training Calibration and Interpretation

  • How to calibrate probabilities.
  • How to interpret the model for investigators (e.g., SHAP).
  • How to prevent attackers from reverse-engineering rules while providing useful explanations.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Data Scientist•TikTok Data Scientist•TikTok Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.