Explain random forests, bagging, and evaluation

Q: Explain random forests, bagging, and evaluation

This question evaluates understanding of ensemble learning and model evaluation, covering Random Forest aggregation, feature subsampling, bagging versus boosting, hyperparameter effects, out-of-bag (OOB) error, class-imbalance handling, feature-importance interpretation, and validation strategies for supervised tabular data in Machine Learning and Data Science. It is commonly asked to assess a candidate's reasoning about bias–variance trade-offs, robustness and evaluation choices for production-ready models; the domain is ensemble methods and model evaluation and the required level combines conceptual understanding with practical application.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Loading...

Random Forests, Bagging vs Boosting, and Practical Model Validation

You are building a supervised learning model on tabular data. Explain and compare ensemble methods, evaluation, and validation choices for Random Forests and related approaches.

A. Random Forest Aggregation and Feature Subsampling

How does a Random Forest classifier aggregate predictions from bootstrapped decision trees? Describe bootstrapping and the aggregation rule for classification vs regression.
How does feature subsampling at each split reduce correlation between trees, and why does that matter for variance reduction?

B. Bagging vs Boosting

Conceptually contrast bagging (e.g., Random Forests) with boosting (e.g., XGBoost/LightGBM).
Compare them in bias–variance terms and discuss typical overfitting/robustness behavior.

C. Key Hyperparameters and Their Effects

Discuss the following hyperparameters and how they affect bias, variance, computation, and class imbalance handling:

n_estimators
max_depth
max_features (a.k.a. mtry)
min_samples_leaf
class_weight

D. Out-of-Bag (OOB) Error Estimation

What is OOB error and how is it computed?
When is OOB reliable, and what are its limitations—especially with heavy class imbalance (e.g., 1% positive rate) or time-series data?

E. Evaluation: Classification vs Regression

Recommend metrics for classification (e.g., ROC-AUC, PR-AUC, log loss, accuracy) and explain when accuracy is misleading.
Recommend metrics for regression (e.g., RMSE, MAE, R²) and explain when R² is misleading.

F. Handling 1% Positive-Rate Imbalance

Describe practical steps for:

Threshold selection (including cost-sensitive thresholds or top-k selection)
Cost-sensitive learning (e.g., class_weight)
Calibrated probabilities
Evaluation with PR curves (and interpretation of baseline)

G. Feature Importance and Pitfalls

Explain:

Impurity-based importance and its biases
Permutation importance (including OOB or validation-based)
Grouped/conditional permutations for correlated features
Leakage pitfalls and how to avoid them

H. When Random Forests Underperform vs Gradient Boosting

Why and when might Random Forests underperform compared to XGBoost/LightGBM?
Provide a scenario where Random Forests are preferable.

I. Concrete Validation Plan for a Tabular Dataset

Provide a step-by-step, reproducible plan to validate a Random Forest, including:

Train/validation/test split strategy (i.i.d. vs time-based)
Cross-validation setup
Early-stopping proxies for Random Forests
Threshold tuning, probability calibration, and final evaluation

Explain random forests, bagging, and evaluation

Random Forests, Bagging vs Boosting, and Practical Model Validation

A. Random Forest Aggregation and Feature Subsampling

B. Bagging vs Boosting

C. Key Hyperparameters and Their Effects

D. Out-of-Bag (OOB) Error Estimation

E. Evaluation: Classification vs Regression

F. Handling 1% Positive-Rate Imbalance

G. Feature Importance and Pitfalls

H. When Random Forests Underperform vs Gradient Boosting

I. Concrete Validation Plan for a Tabular Dataset

Solution

Comments (0)

Explain random forests, bagging, and evaluation

Overview

Random Forests, Bagging vs Boosting, and Practical Model Validation

A. Random Forest Aggregation and Feature Subsampling

B. Bagging vs Boosting

C. Key Hyperparameters and Their Effects

D. Out-of-Bag (OOB) Error Estimation

E. Evaluation: Classification vs Regression

F. Handling 1% Positive-Rate Imbalance

G. Feature Importance and Pitfalls

H. When Random Forests Underperform vs Gradient Boosting

I. Concrete Validation Plan for a Tabular Dataset

Solution

Comments (0)