How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Technical Screen rounds at TikTok.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at TikTok during technical interviews.

Explain and tune XGBoost; prevent overfitting

Quick Overview

This question evaluates understanding of XGBoost tree-boosting internals, hyperparameter impacts on bias/variance/training time, strategies for imbalanced binary classification and leakage-aware validation, post-training calibration, and interpretability for tabular fraud detection.

XGBoost Tree Booster: Objective, Hyperparameters, Tuning for Imbalanced Detection, and Post-training Use

Context: You are building a binary classifier with XGBoost (tree booster) to flag “bad sellers.” Positives are rare (≈0.5%). Answer the following:

(a) Objective, Second-Order Approximation, Split Gain, and Regularization Roles

What objective does XGBoost optimize for tree boosting?
How does the second-order Taylor approximation lead to the split "gain" formula?
Explain the roles of lambda (L2), alpha (L1), gamma (min_split_loss), and learning rate (eta) in split selection and pruning.

(b) Impactful Hyperparameters for Tabular Classification

For each hyperparameter, describe expected direction of effect on bias, variance, and training time:

max_depth, max_leaves, min_child_weight, subsample, colsample_bytree/level, eta, n_estimators, lambda, alpha, gamma, max_delta_step, scale_pos_weight, monotone_constraints.

(c) Tuning Plan for 0.5% Positive Rate (“Bad Sellers”)

Specify a data split strategy that avoids leakage (time- and seller-based splits).
Choose the primary offline metric and justify (e.g., PR-AUC).
Show how to set an operating threshold using a cost matrix.
Describe robust early stopping.
List diagnostics/plots to detect overfitting and data leakage.

(d) Post-training Calibration and Interpretation

How to calibrate probabilities.
How to interpret the model for investigators (e.g., SHAP).
How to prevent attackers from reverse-engineering rules while providing useful explanations.

Quick Overview

Specify a data split strategy that avoids leakage (time- and seller-based splits).

Choose the primary offline metric and justify (e.g., PR-AUC).

Show how to set an operating threshold using a cost matrix.

Describe robust early stopping.

List diagnostics/plots to detect overfitting and data leakage.

Explain and tune XGBoost; prevent overfitting

Quick Overview

XGBoost Tree Booster: Objective, Hyperparameters, Tuning for Imbalanced Detection, and Post-training Use

(a) Objective, Second-Order Approximation, Split Gain, and Regularization Roles

(b) Impactful Hyperparameters for Tabular Classification

(c) Tuning Plan for 0.5% Positive Rate (“Bad Sellers”)

(d) Post-training Calibration and Interpretation

Solution

Comments (0)

Explain and tune XGBoost; prevent overfitting

Quick Overview

XGBoost Tree Booster: Objective, Hyperparameters, Tuning for Imbalanced Detection, and Post-training Use

(a) Objective, Second-Order Approximation, Split Gain, and Regularization Roles

(b) Impactful Hyperparameters for Tabular Classification

(c) Tuning Plan for 0.5% Positive Rate (“Bad Sellers”)

(d) Post-training Calibration and Interpretation

Solution

Comments (0)