Select and tune XGBoost hyperparameters

Q: Select and tune XGBoost hyperparameters

This question evaluates skills in selecting and tuning XGBoost hyperparameters, managing severe class imbalance and sparse one‑hot encodings, handling missing values, and designing compute‑efficient training and grouped cross‑validation to prevent user‑level leakage.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Binary Classification Under Compute and Imbalance Constraints

Context

You are training an XGBoost model for a binary classification problem with:

1,000,000 rows, 100 features (20 numeric, 80 categorical that are one‑hot encoded)
Positive class rate ≈ 1% (10,000 positives / 990,000 negatives)
Hardware: single 16‑core CPU, 32 GB RAM
Wall‑clock training time budget: ≤ 5 minutes

Assume you can provide a user_id to group rows (to prevent leakage in validation) and that features may contain missing values (NaNs). The one‑hot columns are sparse 0/1 indicators.

Tasks

Propose initial XGBoost hyperparameters (eta/learning_rate, max_depth, min_child_weight, subsample, colsample_bytree, lambda, alpha, n_estimators, max_bin or tree_method) and justify each in terms of bias–variance, class imbalance, and compute constraints.
Describe an efficient tuning strategy: search space, early stopping, and a cross‑validation scheme that prevents leakage from users appearing in multiple folds.
Explain exactly how XGBoost handles missing values during tree splitting and how that interacts with one‑hot encoding vs target encoding.
Given severe minority‑class scarcity, compare using scale_pos_weight vs weighted loss vs focal loss; when would each be preferable?

Select and tune XGBoost hyperparameters

Binary Classification Under Compute and Imbalance Constraints

Context

Tasks

Solution

Comments (0)

Select and tune XGBoost hyperparameters

Overview

Binary Classification Under Compute and Imbalance Constraints

Context

Tasks

Solution

Comments (0)