PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Atlassian

Train and evaluate logistic model with regularization

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in supervised binary classification with logistic regression, implementation of L1 and L2 regularization, data preprocessing (missingness checks and standardization), hyperparameter selection via cross-validation, and model evaluation using ROC AUC and threshold-dependent metrics.

  • medium
  • Atlassian
  • Machine Learning
  • Data Scientist

Train and evaluate logistic model with regularization

Company: Atlassian

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

You have two CSVs: training set x and test set x_test. Each has 7 columns: the first is the binary outcome y ∈ {0,1}; the next 6 are continuous features f1–f6. Using either Python (scikit-learn) or R (glmnet), write code to: (1) Load data (X_train = columns 2–7 of x; y_train = column 1; X_test = columns 2–7 of x_test; y_test = column 1). (2) Perform basic diagnostics: missingness, summary stats, feature scaling, and a correlation/collinearity check (report any |r| ≥ 0.8). (3) Fit a baseline logistic regression, then regularized models with L1 (Lasso) and L2 (Ridge). Use cross-validation to select the penalty strength (C in sklearn or lambda in glmnet). (4) Report metrics on the held-out test set: ROC AUC (required) and at least one threshold-dependent metric (e.g., F1 at the threshold maximizing Youden’s J on the validation fold). (5) Compare models, justify the chosen regularization, and identify the most important features (non-zero for L1 or largest absolute standardized coefficients for L2). (6) Provide the final predicted probabilities for X_test and a confusion matrix at your selected threshold.

Quick Answer: This question evaluates proficiency in supervised binary classification with logistic regression, implementation of L1 and L2 regularization, data preprocessing (missingness checks and standardization), hyperparameter selection via cross-validation, and model evaluation using ROC AUC and threshold-dependent metrics.

Related Interview Questions

  • Minimize max L1 radius with k centers in 1D - Atlassian (Medium)
  • Minimize L1 Distance with k Cluster Centers in Array - Atlassian (medium)
Atlassian logo
Atlassian
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
3
0

Binary Classification with Logistic Regression and Regularization

Data

  • Two CSVs: a training set x and a test set x_test .
  • Each has 7 columns:
    • Column 1: binary outcome y ∈ {0, 1}.
    • Columns 2–7: six continuous features f1–f6.

Task

Using Python (scikit-learn) or R (glmnet), do the following:

  1. Load data
    • X_train = columns 2–7 of x ; y_train = column 1 of x .
    • X_test = columns 2–7 of x_test ; y_test = column 1 of x_test .
  2. Basic diagnostics
    • Check missingness by column.
    • Provide summary statistics for features.
    • Standardize features.
    • Compute pairwise feature correlations; report any |r| ≥ 0.8.
  3. Modeling
    • Fit a baseline logistic regression (no regularization).
    • Fit regularized models with L1 (Lasso) and L2 (Ridge) penalties.
    • Use cross-validation to select penalty strength (C in scikit-learn or lambda in glmnet).
  4. Evaluation on held-out test set
    • Report ROC AUC (required).
    • Report at least one threshold-dependent metric (e.g., F1) at the threshold that maximizes Youden’s J, selected on validation folds.
  5. Model comparison and interpretation
    • Compare baseline, L1, and L2 models and justify the chosen regularization.
    • Identify the most important features:
      • L1: non-zero coefficients.
      • L2: features with largest absolute standardized coefficients.
  6. Outputs
    • Final predicted probabilities for X_test.
    • Confusion matrix on X_test at the selected threshold.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Atlassian•More Data Scientist•Atlassian Data Scientist•Atlassian Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.