Build an end-to-end ML classification pipeline

Q: Build an end-to-end ML classification pipeline

This is a ML System Design interview question from Nextdoor for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

End-to-End Tabular Classification Pipeline (Python)

Context

You are given a tabular dataset in a CSV file and asked to build an end-to-end machine learning pipeline for a classification problem. Assume the dataset contains a column named target (binary classification by default). You may extend to multiclass if desired.

Requirements

Load the data from CSV.
Create stratified train/validation/test splits (e.g., 60/20/20).
Handle missing values and encode categorical features.
Standardize numeric features.
Train a simple baseline model (e.g., Logistic Regression) and at least one stronger model (e.g., Gradient Boosting or a small neural network).
Tune key hyperparameters with cross-validation.
Report accuracy, precision, recall, and ROC-AUC on validation and test sets.
Persist the trained model and preprocessing steps.
Implement batch inference via a predict(input_csv_path, output_csv_path) function or CLI.

If you choose a neural network, include a correct training loop with optimizer initialization, forward pass, loss computation, backward pass, and optimizer step.

Deliverables

Clear, well-structured Python code (preferably using scikit-learn for classical models) with docstrings/comments.
A short explanation of design choices and how you would productionize this pipeline.

Build an end-to-end ML classification pipeline

End-to-End Tabular Classification Pipeline (Python)

Context

Requirements

Deliverables

Solution (Locked)

Comments (0)