This question evaluates understanding of tree-based supervised learning methods—decision trees, random forests, and gradient-boosted trees—including key hyperparameters, bias–variance trade-offs, validation techniques such as out-of-bag estimation, suitability for high-dimensional sparse text features, and detection/mitigation of overfitting.
You are interviewing for a Data Scientist role and are asked to compare common tree-based methods for supervised learning (classification/regression), list key hyperparameters, and reason about bias–variance and validation.
(a) Explain decision trees, random forests (RF), and gradient-boosted trees (GBT). Then list key hyperparameters:
(b) Explain why shallow trees are typical in boosting but deeper trees can be used in RF. Relate to bias–variance trade-offs.
(c) Define out-of-bag (OOB) estimation and when it can replace a validation set.
(d) For high-dimensional sparse text features, which model is preferable and why?
(e) Describe how you would detect and mitigate overfitting during boosting.
Login required