A comprehensive collection of common machine learning interview questions and detailed answers, organized by topic. Table of Contents 1. ML Fundamentals 2. Regression 3. Regularization 4. Evaluation Metrics --- ML Fundamentals 1. What are Overfitting and Underfitting? Underfitting: - Occurs when a machine learning model is too simple to capture the underlying patterns in the data - Model performs poorly on both training and new unseen data - Characterized by high training and validation errors - Solutions: - Use more complex models - Add more relevant features - Reduce regularization strength Overfitting: - Occurs when a model becomes too complex and starts memorizing training data instead of learning generalizable patterns - Training error is significantly lower than validation error - Model performs poorly on new, unseen data - Solutions: - Reduce model complexity - Apply regularization techniques (L1, L2, dropout) - Use cross-validation for model selection - Collect more training data - Apply data augmentation 2. What is the Bias-Variance Tradeoff? Bias: - The difference between predicted values and the expected value of real data - Occurs when the model oversimplifies underlying patterns and makes strong assumptions - Leads to underfitting where the model fails to capture true relationships between features and target variables Variance: - Measures how spread the predicted values are from the expected value - High variance models are sensitive to specific data points and may memorize noise or outliers - Leads to overfitting The Tradeoff: - Low variance models tend to be less complex with simple structure → can lead to high bias - Low bias models tend to be more complex with flexible structure → can lead to high variance - Decreasing one component often increases the other - Goal: Find the right balance between bias and variance for optimal model performance 3. What are Common Methods to Prevent Overfitting? 1. Model Complexity Reduction - Use simpler models - Reduce the number of parameters 2. Regularization Techniques - L1 regularization (Lasso) - L2 regularization (Ridge) - Dropout (for neural networks) - Cross-validation for model selection 3. Early Stopping - Stop training when validation performance stops improving 4. Data-based Approaches - Collect more training data - Data augmentation - Remove noisy features 4. How to Determine if One Model is Better Than Another? Given a set of ground truths and two models: 1. Evaluation Metrics - Choose appropriate metrics based on the problem type - Compare performance across multiple metrics 2. Cross-Validation - Split data into multiple folds - Train each model on different folds and test on alternating sets - Evaluate average performance across all folds 3. Statistical Testing - Hypothesis testing to determine if performance differences are statistically significant - A/B testing in production environments 4. Domain Expertise - Consider business requirements - Evaluate model interpretability - Assess computational efficiency --- Regression 1. What are the Basic Assumptions of Linear Regression? 1. Linearity: There is a linear relationship between independent variables (X) and dependent variable (y) 2. Independence: No relationship or correlation between the errors (residuals) of different observations 3. Normality: The residuals are normally distributed 4. Homoscedasticity: The variability of errors (residuals) is constant across all levels of independent variables 5. No Multicollinearity: Independent variables are not highly correlated with each other 2. What Happens with Correlated Variables? How to Solve? Problems with Correlated Variables: - Unstable coefficient estimates - Unreliable significance tests - Difficulties interpreting individual variable contributions - Inflated standard errors Solutions: - Feature selection (remove redundant features) - Ridge regression (L2 regularization) - Principal Component Analysis (PCA) - Feature engineering to create uncorrelated features 3. Explain Regression Coefficients - Coefficients represent the change in the dependent variable associated with a one-unit change in the corresponding independent variable, while holding other variables constant - Interpretation example: If β₁ = 2.5, then a one-unit increase in X₁ leads to a 2.5-unit increase in y, assuming all other variables remain constant - Important: Interpretation should be done with caution and within the context of the specific model and dataset 4. Relationship Between Minimizing Squared Error and Maximizing Likelihood - In linear regression with Gaussian error assumptions, minimizing squared error is equivalent to maximizing the likelihood of observed data - This connection arises because the squared error can be derived from the likelihood function assuming Gaussian errors - When Gaussian error assumptions don't hold (e.g., non-Gaussian or heteroscedastic errors), this relations