PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Other

Derive and regularize logistic regression

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in logistic regression theory and regularization, GLM versus OLS assumptions, class imbalance handling and calibration, temporal validation to avoid leakage, correlated-feature penalty effects, and business-threshold decisioning for expected value, within the Machine Learning domain for a Data Scientist role.

  • hard
  • Other
  • Machine Learning
  • Data Scientist

Derive and regularize logistic regression

Company: Other

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You are building a churn propensity model with logistic regression. Answer precisely: 1) Starting from a Bernoulli likelihood, derive the logistic regression log‑likelihood and its gradient w.r.t. β. Show how L2 and L1 regularization modify the objective (MLE → MAP); write the new objective and gradients/subgradients. 2) List OLS assumptions for linear regression. For each that is relevant to GLMs (e.g., multicollinearity, omitted variables, measurement error, non‑IID), explain how violations manifest in logistic regression and how regularization/feature engineering or robust inference address them. 3) Your positives are 3% of samples. Compare class‑weighting vs. focal loss vs. threshold‑moving. How does each affect calibration? Describe a calibration check and a recalibration method (Platt vs. isotonic) and when you’d prefer each. 4) Define a temporal validation scheme that avoids leakage. Include: feature freeze date, out‑of‑time test window, and k‑fold strategy compatible with time. Specify the exact splits on a 6‑month dataset. 5) With correlated features, contrast L1 vs. L2 on sparsity, stability, and interpretability. Propose a workflow that yields a sparse, stable model with confidence intervals for odds ratios. 6) Give one business‑aligned decision rule for choosing the score threshold using asymmetric costs, and show how to compute the expected value uplift over a “message all” policy.

Quick Answer: This question evaluates proficiency in logistic regression theory and regularization, GLM versus OLS assumptions, class imbalance handling and calibration, temporal validation to avoid leakage, correlated-feature penalty effects, and business-threshold decisioning for expected value, within the Machine Learning domain for a Data Scientist role.

Related Interview Questions

  • Design anomaly detection and handle imbalanced logistic regression - Other (Medium)
  • Extract companies from noisy text - Other (hard)
  • Evaluate and select K in K-means - Other (medium)
  • Explain SVM kernels and complexity - Other (hard)
  • Compare trees, RF, and gradient boosting - Other (medium)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
2
0
Loading...

Churn Propensity with Logistic Regression: Theory, Validation, and Decisions

Context: You are building a churn propensity model (y ∈ {0,1}) using logistic regression for a subscription business. Positives (churners) are 3% of samples. Answer each part precisely and concisely.

1) Logistic regression likelihood and regularization

  • Starting from a Bernoulli likelihood, derive the logistic regression log-likelihood and its gradient with respect to β.
  • Show how L2 and L1 regularization change the objective from MLE to MAP. Write the new objective and gradients/subgradients (note: intercept typically unpenalized).

2) OLS assumptions vs. GLMs

List the OLS assumptions. For each assumption that is relevant to GLMs (e.g., multicollinearity, omitted variables, measurement error, non‑IID), explain how violations manifest in logistic regression and how regularization, feature engineering, or robust inference address them.

3) Class imbalance (3% positives)

  • Compare class weighting vs. focal loss vs. threshold moving. How does each affect calibration?
  • Describe a calibration check and a recalibration method (Platt vs. isotonic), and when you’d prefer each.

4) Temporal validation without leakage

Define a temporal validation scheme that avoids leakage. Include: feature freeze date, out‑of‑time test window, and a k‑fold strategy compatible with time. Specify the exact splits on a 6‑month dataset.

5) Correlated features and penalties

With correlated features, contrast L1 vs. L2 on sparsity, stability, and interpretability. Propose a workflow that yields a sparse, stable model with confidence intervals for odds ratios.

6) Business decision rule and value

Give one business‑aligned decision rule for choosing the score threshold using asymmetric costs, and show how to compute the expected value uplift over a "message all" policy.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Other•More Data Scientist•Other Data Scientist•Other Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.