Explain and tune decision trees robustly

Q: Explain and tune decision trees robustly

The question evaluates a candidate's understanding of CART decision tree mechanics, split criteria and surrogate splits for missing values, hyperparameter tuning and pruning, overfitting diagnostics, preprocessing for large feature sets and high‑cardinality categoricals, and criteria for choosing ensemble methods within the Machine Learning domain.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Decision Trees: Splitting, Tuning, Overfitting, and When to Use Ensembles

Context: You built a CART-style decision tree for a take‑home ML project. Answer concisely with formulas, procedures, and practical guidance.

1) CART Splits: Classification vs. Regression, and Surrogate Splits for Missing Values

Explain how a CART tree selects splits under:

Classification: impurity criteria (Gini, entropy)
Regression: variance/MSE Include exact formulas for Gini, entropy, MSE, the split selection rule, and how surrogate splits work when features have missing values.

2) Choosing max_depth and min_samples_split

Provide a defensible procedure to select these hyperparameters:

Cross‑validation plan (fold type and repetitions)
Early‑stopping/pruning using cost‑complexity pruning (α path)
Metric to optimize under severe class imbalance, and why (PR‑AUC vs. ROC‑AUC vs. F1)
How to pick α from the CCP path without leakage

3) Overfitting Checks

List at least three diagnostics and what patterns flag overfitting in trees. Examples: train–CV gap, learning curves, permutation importance stability, calibration curves.

4) Preprocessing + Modeling Plan (Single Tree) for ~500k rows, ~300 features, including high‑cardinality categoricals and sparse indicators

Specify:

Encoding choice and handling rare categories
Handling missing values
Monotonic constraints (if any)
Feature binning
Computational cost and concrete hyperparameter ranges; expected training time order‑of‑magnitude

5) When Would Random Forests or Gradient‑Boosted Trees Outperform a Single Tree?

Name at least three data/target conditions and discuss trade‑offs (variance, interpretability, latency, OOB vs. CV, calibration). Describe how to compare models fairly (data splits, nested CV, fixed preprocessing, identical evaluation protocol).