Build an end-to-end ML classification pipeline

Q: Build an end-to-end ML classification pipeline

This question evaluates a candidate's competency in building end-to-end tabular classification pipelines, including data loading and splitting, missing-value handling, categorical encoding, feature scaling, model training and comparison, hyperparameter tuning, metric-based evaluation, model persistence, and batch inference.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

End-to-End Tabular Classification Pipeline (Python)

Context

You are given a tabular dataset in a CSV file and asked to build an end-to-end machine learning pipeline for a classification problem. Assume the dataset contains a column named target (binary classification by default). You may extend to multiclass if desired.

Requirements

Load the data from CSV.
Create stratified train/validation/test splits (e.g., 60/20/20).
Handle missing values and encode categorical features.
Standardize numeric features.
Train a simple baseline model (e.g., Logistic Regression) and at least one stronger model (e.g., Gradient Boosting or a small neural network).
Tune key hyperparameters with cross-validation.
Report accuracy, precision, recall, and ROC-AUC on validation and test sets.
Persist the trained model and preprocessing steps.
Implement batch inference via a predict(input_csv_path, output_csv_path) function or CLI.

If you choose a neural network, include a correct training loop with optimizer initialization, forward pass, loss computation, backward pass, and optimizer step.

Deliverables

Clear, well-structured Python code (preferably using scikit-learn for classical models) with docstrings/comments.
A short explanation of design choices and how you would productionize this pipeline.

Build an end-to-end ML classification pipeline

Quick Overview

End-to-End Tabular Classification Pipeline (Python)

Context

Requirements

Deliverables

Solution

Comments (0)