Build and evaluate a Colab classification model

Q: Build and evaluate a Colab classification model

This is a Machine Learning interview question from Nextdoor for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

End-to-End Tabular Classification Workflow in Google Colab

You are asked to design and implement a complete classification workflow for a tabular dataset in Google Colab.

Include the following:

Data loading and basic setup (Colab specifics, package installs, reproducibility seed).
Exploratory Data Analysis (EDA): schema, missingness, target distribution, and quick sanity checks.
Feature preprocessing: handling missing values, scaling numeric features, encoding categoricals, handling rare categories, and guarding against leakage.
Data splitting strategy: train/validation/test with stratification; justify choices (e.g., time-based splits if time features exist).
Baselines and model selection: build a naive baseline and a simple linear model; then consider stronger non-linear models. Discuss algorithm trade-offs.
Cross-validation and hyperparameter tuning: use an appropriate CV strategy (e.g., StratifiedKFold), choose a scoring metric, and tune hyperparameters.
Class imbalance: diagnose and mitigate (class weights, resampling like SMOTE, thresholding strategies). Explain when and why to use each.
Evaluation: select and justify metrics (accuracy, precision/recall, F1, ROC-AUC, PR-AUC); show threshold selection for operational goals.
Confidence intervals: report uncertainty for key metrics using a sound method (e.g., bootstrap).
Leakage prevention: show how your pipeline avoids leakage across preprocessing, resampling, tuning, and evaluation.
Interpretation and iteration: interpret model (feature importance, coefficients, permutation importance), perform error analysis, and outline iteration steps.

Provide code or clear pseudocode illustrating the structure and key steps. Explain trade-offs and how you would interpret results and iterate.

Build and evaluate a Colab classification model

End-to-End Tabular Classification Workflow in Google Colab

Solution (Locked)

Comments (0)