PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Explain random forests, bagging, and evaluation

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of ensemble learning and model evaluation, covering Random Forest aggregation, feature subsampling, bagging versus boosting, hyperparameter effects, out-of-bag (OOB) error, class-imbalance handling, feature-importance interpretation, and validation strategies for supervised tabular data in Machine Learning and Data Science. It is commonly asked to assess a candidate's reasoning about bias–variance trade-offs, robustness and evaluation choices for production-ready models; the domain is ensemble methods and model evaluation and the required level combines conceptual understanding with practical application.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Explain random forests, bagging, and evaluation

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

Explain how a Random Forest classifier aggregates bootstrapped decision trees and how feature subsampling reduces correlation. Contrast bagging with boosting conceptually and in bias–variance terms. Cover: - Key hyperparameters (n_estimators, max_depth, max_features, min_samples_leaf, class_weight) and their effects. - Out-of-bag (OOB) error estimation: what it is, when it’s reliable, and limitations with heavy class imbalance or time-series. - Classification vs regression evaluation: choose metrics for each (e.g., ROC-AUC, PR-AUC, log loss, RMSE) and when accuracy or R^2 is misleading. - Handling 1% positive-rate imbalance: threshold selection, cost-sensitive learning, calibrated probabilities, and evaluation on PR curves. - Feature importance: impurity-based biases, permutation importance, grouped permutations, and leakage pitfalls. - When RF underperforms vs gradient boosting (e.g., XGBoost/LightGBM) and why; give a scenario where RF is preferable. Provide a concrete plan to validate a Random Forest on a tabular dataset, including cross-validation, early stopping proxies, and a reproducible train/validation/test split.

Quick Answer: This question evaluates understanding of ensemble learning and model evaluation, covering Random Forest aggregation, feature subsampling, bagging versus boosting, hyperparameter effects, out-of-bag (OOB) error, class-imbalance handling, feature-importance interpretation, and validation strategies for supervised tabular data in Machine Learning and Data Science. It is commonly asked to assess a candidate's reasoning about bias–variance trade-offs, robustness and evaluation choices for production-ready models; the domain is ensemble methods and model evaluation and the required level combines conceptual understanding with practical application.

Related Interview Questions

  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
2
0
Loading...

Random Forests, Bagging vs Boosting, and Practical Model Validation

You are building a supervised learning model on tabular data. Explain and compare ensemble methods, evaluation, and validation choices for Random Forests and related approaches.

A. Random Forest Aggregation and Feature Subsampling

  1. How does a Random Forest classifier aggregate predictions from bootstrapped decision trees? Describe bootstrapping and the aggregation rule for classification vs regression.
  2. How does feature subsampling at each split reduce correlation between trees, and why does that matter for variance reduction?

B. Bagging vs Boosting

  1. Conceptually contrast bagging (e.g., Random Forests) with boosting (e.g., XGBoost/LightGBM).
  2. Compare them in bias–variance terms and discuss typical overfitting/robustness behavior.

C. Key Hyperparameters and Their Effects

Discuss the following hyperparameters and how they affect bias, variance, computation, and class imbalance handling:

  • n_estimators
  • max_depth
  • max_features (a.k.a. mtry)
  • min_samples_leaf
  • class_weight

D. Out-of-Bag (OOB) Error Estimation

  1. What is OOB error and how is it computed?
  2. When is OOB reliable, and what are its limitations—especially with heavy class imbalance (e.g., 1% positive rate) or time-series data?

E. Evaluation: Classification vs Regression

  • Recommend metrics for classification (e.g., ROC-AUC, PR-AUC, log loss, accuracy) and explain when accuracy is misleading.
  • Recommend metrics for regression (e.g., RMSE, MAE, R²) and explain when R² is misleading.

F. Handling 1% Positive-Rate Imbalance

Describe practical steps for:

  • Threshold selection (including cost-sensitive thresholds or top-k selection)
  • Cost-sensitive learning (e.g., class_weight)
  • Calibrated probabilities
  • Evaluation with PR curves (and interpretation of baseline)

G. Feature Importance and Pitfalls

Explain:

  • Impurity-based importance and its biases
  • Permutation importance (including OOB or validation-based)
  • Grouped/conditional permutations for correlated features
  • Leakage pitfalls and how to avoid them

H. When Random Forests Underperform vs Gradient Boosting

  • Why and when might Random Forests underperform compared to XGBoost/LightGBM?
  • Provide a scenario where Random Forests are preferable.

I. Concrete Validation Plan for a Tabular Dataset

Provide a step-by-step, reproducible plan to validate a Random Forest, including:

  • Train/validation/test split strategy (i.i.d. vs time-based)
  • Cross-validation setup
  • Early-stopping proxies for Random Forests
  • Threshold tuning, probability calibration, and final evaluation

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.