Derive logistic regression and thresholds

Q: Derive logistic regression and thresholds

This question evaluates understanding of logistic regression and related competencies including derivation of the Bernoulli log-likelihood, gradient and Hessian for L2-regularized models, convexity reasoning, probability calibration and decision thresholds under asymmetric costs, numerically stable log-sigmoid expressions, and analytic treatment of class imbalance effects. It is categorized under Statistics & Math for data scientist roles and is commonly asked because it combines conceptual theoretical derivations with practical application skills, testing both mathematical reasoning (derivations and convexity proofs) and applied understanding of thresholds, calibration, numerical stability, and regularization.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

Logistic Regression Deep Dive (Binary Classification)

Assume a binary classification setting with observations {(x_i, y_i)} for i=1..n, where x_i ∈ R^p (with an intercept term) and y_i ∈ {0,1}. Let η_i = x_i^T β and σ(z) be the logistic (sigmoid) function.

Tasks

Write σ(z) and the Bernoulli log-likelihood for binary logistic regression. Derive the gradient and Hessian with respect to β for L2-regularized logistic regression, and explain why the (penalized) objective is convex.
Single-feature example: with x ∈ R, β0 = −1.2 and β1 = 0.8, compute P(y=1 | x=2.0). Report the odds and the odds ratio for a one-unit increase in x.
Decision threshold with costs: the positive-class base rate is 2%, and a false negative costs 10× a false positive. Compute the Bayes-optimal decision threshold and explain how you would calibrate probabilities (e.g., Platt scaling vs. isotonic).
Numerical stability: give numerically stable expressions for log(σ(z)) and log(1 − σ(z)) and explain why they avoid overflow/underflow.
Class imbalance: explain how severe class imbalance affects MLE estimates and which regularization or reweighting you would use. Justify analytically.

Derive logistic regression and thresholds

Logistic Regression Deep Dive (Binary Classification)

Tasks

Solution

Comments (0)

Derive logistic regression and thresholds

Overview

Logistic Regression Deep Dive (Binary Classification)

Tasks

Solution

Comments (0)