Explain AUC, activations, ensembles, and imbalance
Company: Boston Consulting Group
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Take-home Project
Answer all sub-questions precisely. AUC/Ranking: Given scores s = [0.10, 0.40, 0.35, 0.80, 0.60] and labels y = [0, 1, 0, 1, 0], compute the ROC AUC exactly via pairwise positive–negative comparisons (no library). Then, draw the ROC points and compute the area by trapezoids; both methods should match. How would extreme class imbalance (1% positives) change how you interpret AUC vs Average Precision? Activations: For each scenario, pick the output-layer activation and loss, and justify: (a) single-label multi-class (K=7), (b) multi-label (K=7), (c) bounded regression in [0,1], (d) unbounded regression with outliers. Discuss vanishing gradients for sigmoid/tanh and why leaky-ReLU or GELU might help in hidden layers. MSE vs MAE: Explain the optimization and robustness differences (gradients, influence of outliers, median vs mean optimality). Ensembles: Contrast bagging vs boosting in terms of bias/variance and when you’d choose each for noisy data. Overfitting: Name two concrete, testable diagnostics (with plots/metrics) and two mitigation tactics that won’t leak validation information.
Quick Answer: This question evaluates competency in model evaluation metrics (ROC AUC and Average Precision), handling class imbalance, choice of output activations and loss functions, robustness to outliers (MSE vs MAE), ensemble methods, and overfitting diagnostics within the Machine Learning domain.