PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

MLE Knowledge Collection

This collection covers core machine learning interview topics including ML fundamentals (overfitting and underfitting, bias–variance tradeoff)......

Author: PracHub

Published: 8/3/2025

Home›Knowledge Hub›MLE Knowledge Collection

MLE Knowledge Collection

By PracHub
August 3, 2025
0

Quick Overview

This collection covers core machine learning interview topics including ML fundamentals (overfitting and underfitting, bias–variance tradeoff), regression, regularization techniques, evaluation metrics, model comparison methods, cross-validation, and practical strategies to prevent overfitting, presented as common interview questions with detailed answers organized by topic. It is a topical Q&A study guide intended for machine learning engineers and other practitioners preparing for technical interviews or reviewing core ML concepts and evaluation methods.

Machine Learning EngineerFree

A comprehensive collection of common machine learning interview questions and detailed answers, organized by topic.

Table of Contents

  1. ML Fundamentals
  2. Regression
  3. Regularization
  4. Evaluation Metrics

image.png

ML Fundamentals

1. What are Overfitting and Underfitting?

Underfitting:

  • Occurs when a machine learning model is too simple to capture the underlying patterns in the data
  • Model performs poorly on both training and new unseen data
  • Characterized by high training and validation errors
  • Solutions:
    • Use more complex models
    • Add more relevant features
    • Reduce regularization strength

Overfitting:

  • Occurs when a model becomes too complex and starts memorizing training data instead of learning generalizable patterns
  • Training error is significantly lower than validation error
  • Model performs poorly on new, unseen data
  • Solutions:
    • Reduce model complexity
    • Apply regularization techniques (L1, L2, dropout)
    • Use cross-validation for model selection
    • Collect more training data
    • Apply data augmentation

2. What is the Bias-Variance Tradeoff?

Bias:

  • The difference between predicted values and the expected value of real data
  • Occurs when the model oversimplifies underlying patterns and makes strong assumptions
  • Leads to underfitting where the model fails to capture true relationships between features and target variables

Variance:

  • Measures how spread the predicted values are from the expected value
  • High variance models are sensitive to specific data points and may memorize noise or outliers
  • Leads to overfitting

The Tradeoff:

  • Low variance models tend to be less complex with simple structure → can lead to high bias
  • Low bias models tend to be more complex with flexible structure → can lead to high variance
  • Decreasing one component often increases the other
  • Goal: Find the right balance between bias and variance for optimal model performance

3. What are Common Methods to Prevent Overfitting?

  1. Model Complexity Reduction

    • Use simpler models
    • Reduce the number of parameters
  2. Regularization Techniques

    • L1 regularization (Lasso)
    • L2 regularization (Ridge)
    • Dropout (for neural networks)
    • Cross-validation for model selection
  3. Early Stopping

    • Stop training when validation performance stops improving
  4. Data-based Approaches

    • Collect more training data
    • Data augmentation
    • Remove noisy features

4. How to Determine if One Model is Better Than Another?

Given a set of ground truths and two models:

  1. Evaluation Metrics

    • Choose appropriate metrics based on the problem type
    • Compare performance across multiple metrics
  2. Cross-Validation

    • Split data into multiple folds
    • Train each model on different folds and test on alternating sets
    • Evaluate average performance across all folds
  3. Statistical Testing

    • Hypothesis testing to determine if performance differences are statistically significant
    • A/B testing in production environments
  4. Domain Expertise

    • Consider business requirements
    • Evaluate model interpretability
    • Assess computational efficiency

Regression

1. What are the Basic Assumptions of Linear Regression?

  1. Linearity: There is a linear relationship between independent variables (X) and dependent variable (y)

  2. Independence: No relationship or correlation between the errors (residuals) of different observations

  3. Normality: The residuals are normally distributed

  4. Homoscedasticity: The variability of errors (residuals) is constant across all levels of independent variables

  5. No Multicollinearity: Independent variables are not highly correlated with each other

2. What Happens with Correlated Variables? How to Solve?

Problems with Correlated Variables:

  • Unstable coefficient estimates
  • Unreliable significance tests
  • Difficulties interpreting individual variable contributions
  • Inflated standard errors

Solutions:

  • Feature selection (remove redundant features)
  • Ridge regression (L2 regularization)
  • Principal Component Analysis (PCA)
  • Feature engineering to create uncorrelated features

3. Explain Regression Coefficients

  • Coefficients represent the change in the dependent variable associated with a one-unit change in the corresponding independent variable, while holding other variables constant
  • Interpretation example: If β₁ = 2.5, then a one-unit increase in X₁ leads to a 2.5-unit increase in y, assuming all other variables remain constant
  • Important: Interpretation should be done with caution and within the context of the specific model and dataset

4. Relationship Between Minimizing Squared Error and Maximizing Likelihood

  • In linear regression with Gaussian error assumptions, minimizing squared error is equivalent to maximizing the likelihood of observed data
  • This connection arises because the squared error can be derived from the likelihood function assuming Gaussian errors
  • When Gaussian error assumptions don't hold (e.g., non-Gaussian or heteroscedastic errors), this relationship may not be valid

5. How to Minimize Inter-correlation Between Variables?

  1. Feature Selection: Remove highly correlated features
  2. PCA: Transform features into uncorrelated principal components
  3. Ridge Regression: Handles multicollinearity through L2 regularization
  4. Feature Engineering: Create new uncorrelated features from existing ones

6. Can Linear Regression Handle Non-linear Relationships?

Simple linear regression cannot accurately capture non-linear relationships, but you can:

  1. Add Interaction Terms: X₁ × X₂ to capture interaction effects
  2. Polynomial Features: Add X², X³, etc.
  3. Piecewise Linear Regression: Different linear models for different regions
  4. Transform Variables: Log, square root, or other transformations
  5. Switch to Non-linear Models: If relationship is strongly non-linear

7. Why Use Interaction Variables?

  1. Capture Non-Additive Effects: When the effect of one variable depends on another
  2. Improved Model Fit: Better representation of complex relationships
  3. Context-Specific Relationships: Model how relationships change under different conditions
  4. Avoid Omitted Variable Bias: Include important interaction effects
  5. Enhanced Interpretability: Understand how variables interact

Regularization

1. L1 vs L2 Regularization: Differences

L1 Regularization (Lasso):

  • Adds the sum of absolute values of parameters to loss function
  • Formula: ||β||₁ = Σ|βᵢ|
  • Can shrink coefficients to exactly zero
  • Produces sparse models (feature selection)

L2 Regularization (Ridge):

  • Adds the sum of squared parameters to loss function
  • Formula: ||β||₂ = √(Σβᵢ²)
  • Shrinks coefficients towards zero but not exactly zero
  • Keeps all features but with reduced impact

2. Lasso Regression

  • Full name: Least Absolute Shrinkage and Selection Operator
  • Objective function: L = ||ŷ - y||₂ + λ||β||₁
  • Where ŷ = f_β(x) is the prediction
  • Can drive coefficients to exactly zero when λ is sufficiently large
  • Useful for automatic feature selection
  • Creates sparse models

3. Ridge Regression

  • Linear regression with L2 regularization
  • Objective function: L = ||ŷ - y||₂ + λ||β||₂
  • Higher λ values result in more aggressive shrinkage
  • All features retained but with reduced coefficients
  • Handles multicollinearity well

4. Why is L1 Sparse but L2 is Not?

  • Geometric interpretation:
    • L1 norm creates diamond-shaped constraint regions with corners at zero
    • L2 norm creates circular (ball-shaped) constraint regions
  • The optimization solution often hits the corners of the L1 diamond (where coefficients are zero)
  • For L2, the solution typically hits a point on the sphere where coefficients are non-zero
  • L1 penalty is not differentiable at zero, creating a "pulling" effect towards exact zeros

5. Why Does Regularization Work?

  • Adds constraints to the coefficient values
  • Reduces model complexity by penalizing large coefficients
  • Reduces variance at the cost of slightly increased bias
  • Prevents overfitting by discouraging the model from fitting noise
  • Handles multicollinearity by distributing weights among correlated features

6. Why Use L1/L2 Instead of L3/L4?

  1. Mathematical Properties: L1 and L2 have well-studied properties that align with regularization goals
  2. Computational Simplicity: Higher-order norms increase complexity without significant benefits
  3. Interpretability: L1 (sparsity) and L2 (smoothness) have clear interpretations
  4. Empirical Success: L1 and L2 have proven effective in practice
  5. Optimization: Efficient algorithms exist for L1 and L2 regularization

Evaluation Metrics

1. Precision and Recall Trade-off

Precision:

  • Measures how many positive predictions are actually true positives
  • Formula: Precision = TP / (TP + FP)
  • Focuses on the quality of positive predictions
  • High precision = low false positives

Recall (Sensitivity):

  • Measures how many actual positives are correctly identified
  • Formula: Recall = TP / (TP + FN)
  • Emphasizes completeness of positive predictions
  • High recall = low false negatives

Trade-off:

  • Improving one metric often decreases the other
  • High precision, low recall: Conservative in predicting positives, few false positives but may miss true positives
  • Low precision, high recall: Liberal in predicting positives, captures most true positives but generates more false positives
  • Choice depends on the cost of false positives vs. false negatives

2. Metrics for Imbalanced Data

  1. Precision and Recall: More informative than accuracy for imbalanced datasets
  2. F1-Score: Harmonic mean of precision and recall, provides balanced evaluation
  3. Area Under Precision-Recall Curve (AUPRC): Robust to class imbalance, focuses on positive class
  4. ROC-AUC: Area under ROC curve, quantifies discriminative power
  5. Matthews Correlation Coefficient (MCC): Considers all confusion matrix elements

3. Choosing Classification Metrics

Consider:

  1. Problem understanding: Importance of correctly classifying each class
  2. Class imbalance: Use appropriate metrics for imbalanced data
  3. Business impact: Cost of false positives vs. false negatives
  4. Domain knowledge: Industry-specific requirements
  5. Multiple metrics: Often need to evaluate multiple aspects

4. Confusion Matrix

A table showing classification results:

  • True Positives (TP): Correctly predicted positive cases
  • True Negatives (TN): Correctly predicted negative cases
  • False Positives (FP): Incorrectly predicted as positive
  • False Negatives (FN): Incorrectly predicted as negative

From these, derive: Accuracy, Precision, Recall, F1-Score

5. TPR, FPR, and ROC

True Positive Rate (TPR):

  • Also called Sensitivity or Recall
  • TPR = TP / (TP + FN) = TP / (All Actual Positives)
  • Measures classifier's ability to identify positive instances

False Positive Rate (FPR):

  • FPR = FP / (FP + TN) = FP / (All Actual Negatives)
  • Measures proportion of negatives incorrectly classified as positive

ROC Curve:

  • Plots TPR vs. FPR at various classification thresholds
  • Shows trade-off between sensitivity and specificity

6. AUC Interpretation

  • Area Under the ROC Curve
  • Represents probability that the model ranks a random positive instance higher than a random negative instance
  • Range: 0 to 1
    • AUC = 0.5: No better than random guessing
    • AUC = 1.0: Perfect classifier
    • AUC < 0.5: Worse than random (but can be inverted)
  • Single number summary of model's discriminative ability

7. Ranking Metrics

Mean Reciprocal Rank (MRR):

  • Formula: MRR = (1/m) × Σ(1/rankᵢ)
  • Considers rank of first relevant item only
  • Good when only one relevant result is expected

Recall@k:

  • Formula: Recall@k = (# relevant items in top k) / (total # relevant items)
  • Measures coverage of relevant items
  • Challenge: Total relevant items can be very large

Precision@k:

  • Formula: Precision@k = (# relevant items in top k) / k
  • Measures precision of top k results
  • Doesn't consider ranking quality within top k

Average Precision (AP):

  • Computes average of precision@k for each relevant item
  • Higher if relevant items appear earlier in list
  • Considers both precision and ranking quality

Mean Average Precision (mAP):

  • Average of AP across multiple queries
  • Works well for binary relevance (relevant/not relevant)

Normalized Discounted Cumulative Gain (nDCG):

  • DCG formula: DCGₚ = Σ(relᵢ / log₂(i+1))
  • nDCG = DCG / IDCG (ideal DCG)
  • Handles graded relevance scores (not just binary)
  • Good for continuous relevance scores

8. Recommender System Metrics

  1. Precision@k: Proportion of relevant items in top k recommendations
  2. MRR: Good when expecting one relevant item
  3. mAP: For binary relevance (liked/not liked)
  4. nDCG: For graded relevance (ratings 1-5)
  5. Diversity: Average pairwise dissimilarity between recommendations
    • Low similarity score = high diversity
    • Important for user engagement

Choosing between metrics:

  • Binary relevance → mAP
  • Graded relevance → nDCG
  • Single relevant item → MRR
  • User experience → Include diversity metrics

Note: This guide covers fundamental concepts commonly asked in machine learning interviews. Continue practicing with real problems and stay updated with the latest developments in the field.


Comments (0)

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.