How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a easy difficulty Machine Learning question, commonly asked during Technical Screen rounds at LinkedIn.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at LinkedIn during technical interviews.

Handle imbalance, sampling, and overfitting | LinkedIn Interview Question

Quick Overview

This question evaluates a data scientist's proficiency in machine learning topics including handling class imbalance, selecting and interpreting evaluation metrics, verifying sample representativeness, preventing overfitting in tree-based models, and understanding why L1/L2 regularization introduces biased coefficient estimates.

Practical ML questions (classification and generalization)

Answer the following ML engineering/data science questions.

A) Class imbalance

You’re training a classifier where the positive class is rare.

How do you handle class imbalance (data-level and algorithm-level approaches)?
Which evaluation metrics are appropriate and why (e.g., accuracy vs precision/recall/F1/ROC-AUC/PR-AUC)?
What pitfalls should you watch for (e.g., calibration, thresholding, leakage)?

B) Training on a sample from a very large dataset

You train a model on a sample drawn from a massive dataset.

How do you verify the sample is representative ?
How do you validate that a model trained on the sample will generalize to the full population?

C) Preventing overfitting in tree-based models

For decision trees / random forests / gradient-boosted trees:

What knobs and practices help prevent overfitting ?

D) Why L1/L2 regularization is biased

Explain why L1 (Lasso) and L2 (Ridge) regularization typically produce biased coefficient estimates, and why we still use them.

Handle imbalance, sampling, and overfitting