How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Nextdoor.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Nextdoor during technical interviews.

Build a model using only pandas/numpy | Nextdoor Interview Question

Quick Overview

This question evaluates proficiency in applied machine learning competencies including data preprocessing, categorical encoding, feature scaling, implementing baseline models and optimization using numpy/pandas, and selecting appropriate evaluation metrics.

You are given a tabular dataset as a pandas DataFrame df with:

Feature columns (numeric and/or categorical)
A target column y (either binary classification or continuous regression)

You may use pandas and numpy (and standard Python), and you may use Google for documentation, but you may not use AI assistants or high-level ML libraries (e.g., scikit-learn).

Tasks:

Data preparation
- Handle missing values.
- Encode categorical variables.
- Split into train/validation (or implement cross-validation).
- Standardize/normalize features when appropriate.
Modeling (from scratch)
- Choose a reasonable baseline model (e.g., linear regression for regression; logistic regression for binary classification).
- Implement training using numpy (e.g., gradient descent).
- Implement prediction.
Evaluation
- Pick suitable metrics (e.g., MSE/RMSE for regression; accuracy/precision/recall/F1/AUC for classification).
- Explain how you would detect overfitting and what you would do about it.
Concept questions (be prepared to explain)
- Bias–variance tradeoff
- Regularization (L1 vs L2) and how it changes the objective
- Class imbalance handling
- Feature scaling: when it matters and why
- Train/validation/test leakage and how to avoid it

Quick Overview

You are given a tabular dataset as a pandas DataFrame df with:

Feature columns (numeric and/or categorical)
A target column y (either binary classification or continuous regression)

You may use pandas and numpy (and standard Python), and you may use Google for documentation, but you may not use AI assistants or high-level ML libraries (e.g., scikit-learn).

Tasks:

Data preparation
- Handle missing values.
- Encode categorical variables.
- Split into train/validation (or implement cross-validation).
- Standardize/normalize features when appropriate.
Modeling (from scratch)
- Choose a reasonable baseline model (e.g., linear regression for regression; logistic regression for binary classification).
- Implement training using numpy (e.g., gradient descent).
- Implement prediction.
Evaluation
- Pick suitable metrics (e.g., MSE/RMSE for regression; accuracy/precision/recall/F1/AUC for classification).
- Explain how you would detect overfitting and what you would do about it.
Concept questions (be prepared to explain)
- Bias–variance tradeoff
- Regularization (L1 vs L2) and how it changes the objective
- Class imbalance handling
- Feature scaling: when it matters and why
- Train/validation/test leakage and how to avoid it

Build a model using only pandas/numpy

Quick Overview

Solution

Submit Your Answer to Earn 20XP

Build a model using only pandas/numpy

Quick Overview

Solution

Submit Your Answer to Earn 20XP