Technical Screen — Machine Learning
Answer all parts precisely.
1) Binary logistic regression: model, loss, gradient, convexity
-
Define the model: p(y=1 | x) = σ(w · x + b).
-
Derive the negative log-likelihood (log-loss) and its gradient with respect to w and b.
-
Explain why the loss is convex and the implications for optimization.
2) L1 vs L2 regularization in logistic regression
-
Compare their effects on:
-
Sparsity
-
Handling of multicollinearity
-
Margin geometry
-
Probability calibration
-
When would you pick elastic net over pure L1 or L2?
3) When logistic regression can outperform a random forest
Discuss conditions such as:
-
(a) Truly linear or near-linear decision boundaries with limited interactions
-
(b) High-dimensional sparse binary features (e.g., text)
-
(c) Small-n, large-p regimes where strong regularization helps
-
(d) When calibrated probabilities and interpretability are priorities
4) Remedies for overfitting and diagnostics beyond accuracy
-
Logistic regression: regularization strength, feature selection, class weighting, calibration, proper cross-validation.
-
Random forests: increase trees, limit depth, max_features, min_samples_* settings, OOB validation.
-
Boosting: learning rate, number of estimators, max_depth/leaf-wise growth, subsampling, early stopping.
-
How to detect overfitting beyond accuracy (e.g., calibration curves, PR-AUC vs ROC-AUC, decision boundary checks).
5) Random forests vs gradient boosting
Contrast them on:
-
Bias–variance characteristics
-
Robustness to noisy features
-
Sensitivity to hyperparameters
-
Ability to capture monotonic constraints
-
Handling missing values natively
-
Training and inference cost
Provide one real-world scenario where each clearly dominates the other and justify.
6) Case study: Imbalanced, sparse, drifting data
Data: 50k rows, 10k sparse binary features, class imbalance (1% positive), strong temporal drift.
Propose end-to-end pipelines for:
-
(a) Logistic regression with elastic net
-
(b) A tree-based method (choose RF or GBDT)
Cover: feature processing, regularization/hyperparameters, evaluation protocol (time-based CV), threshold selection, probability calibration, and how to compare the models fairly. State pitfalls to avoid (e.g., leakage via target encoding, improper scaling of sparse inputs).