This question evaluates competency in ensemble model selection (bagging vs boosting), handling extreme class imbalance, metric selection, hyperparameter tuning, time-aware validation, and failure-mode diagnosis for large-scale, time-ordered binary classification, and is commonly asked because it probes bias–variance trade-offs, robustness to label noise on the minority class, and practical evaluation and leakage-avoidance concerns in Machine Learning. It assesses both conceptual understanding of algorithmic trade-offs and statistical robustness as well as practical application skills such as designing time-blocked cross-validation, selecting appropriate metrics for imbalanced data, specifying hyperparameter grids, and interpreting diagnostic plots.

You are building a binary classifier to detect 0.5% fraudulent events among 10,000,000 time-ordered transactions with 300 features (100 numeric, 200 one-hot). You must choose between a bagged Random Forest and a Gradient Boosting model (e.g., XGBoost or LightGBM).
Address the following:
Login required