This question evaluates competency in handling class imbalance, designing representative sampling strategies, verifying sample-to-population generalization, preventing overfitting in tree-based models, and selecting evaluation metrics for highly imbalanced binary classification.
You are training a binary classifier on a very large dataset where the positive class is rare. Because the full dataset is too large to train on directly, you plan to draw a sample and train a tree-based model.
Explain how you would: