Train and evaluate logistic model with regularization

Q: Train and evaluate logistic model with regularization

This question evaluates proficiency in supervised binary classification with logistic regression, implementation of L1 and L2 regularization, data preprocessing (missingness checks and standardization), hyperparameter selection via cross-validation, and model evaluation using ROC AUC and threshold-dependent metrics.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Binary Classification with Logistic Regression and Regularization

Data

Two CSVs: a training set x and a test set x_test .
Each has 7 columns:
- Column 1: binary outcome y ∈ {0, 1}.
- Columns 2–7: six continuous features f1–f6.

Task

Using Python (scikit-learn) or R (glmnet), do the following:

Load data
- X_train = columns 2–7 of x ; y_train = column 1 of x .
- X_test = columns 2–7 of x_test ; y_test = column 1 of x_test .
Basic diagnostics
- Check missingness by column.
- Provide summary statistics for features.
- Standardize features.
- Compute pairwise feature correlations; report any |r| ≥ 0.8.
Modeling
- Fit a baseline logistic regression (no regularization).
- Fit regularized models with L1 (Lasso) and L2 (Ridge) penalties.
- Use cross-validation to select penalty strength (C in scikit-learn or lambda in glmnet).
Evaluation on held-out test set
- Report ROC AUC (required).
- Report at least one threshold-dependent metric (e.g., F1) at the threshold that maximizes Youden’s J, selected on validation folds.
Model comparison and interpretation
- Compare baseline, L1, and L2 models and justify the chosen regularization.
- Identify the most important features:
  - L1: non-zero coefficients.
  - L2: features with largest absolute standardized coefficients.
Outputs
- Final predicted probabilities for X_test.
- Confusion matrix on X_test at the selected threshold.

Train and evaluate logistic model with regularization

Overview

Binary Classification with Logistic Regression and Regularization

Data

Task

Comments (0)