How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Citadel.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Citadel during technical interviews.

Explain RF optimization and variable-importance pitfalls

Last updated: Apr 23, 2026

Quick Overview

This question evaluates understanding of Random Forest regularization and feature-importance diagnostics, including recognition of biases between mean decrease impurity and permutation importance and considerations for reliable importance estimation and efficient training on large tabular datasets.

|Home/Machine Learning/Citadel

Explain RF optimization and variable-importance pitfalls

Citadel

Oct 13, 2025, 9:49 PM

mediumData ScientistTechnical ScreenMachine Learning

Optimize and Regularize a Random Forest Regressor for Tabular Data

Context: You are training a Random Forest (RF) regressor on tabular data and need to both regularize the model and interpret feature importance reliably, while keeping training efficient on large datasets.

Explain the following:

Why classic RFs do not prune trees post-training, and how max_depth, min_samples_leaf, and max_features control overfitting.
Two importance measures—mean decrease impurity (MDI) vs. permutation importance (PI): how each is computed, when they disagree, and biases (e.g., favoring high-cardinality categoricals or correlated features).
How to obtain reliable importances using out-of-bag (OOB) estimates, repeated permutations, or conditional/permutation schemes that account for correlations.
Practical steps to speed training on large data (e.g., subsampling, feature bagging, warm-starting trees).

Loading comments...

Browse More Questions

More Machine Learning•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel Machine Learning•Data Scientist Machine Learning