PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/Nextdoor

Build an end-to-end ML classification pipeline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in building end-to-end tabular classification pipelines, including data loading and splitting, missing-value handling, categorical encoding, feature scaling, model training and comparison, hyperparameter tuning, metric-based evaluation, model persistence, and batch inference.

  • medium
  • Nextdoor
  • ML System Design
  • Machine Learning Engineer

Build an end-to-end ML classification pipeline

Company: Nextdoor

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

Given a tabular dataset in a CSV file, implement an end-to-end pipeline to perform a classification task. Requirements: ( 1) load the data; ( 2) create stratified train/validation/test splits; ( 3) handle missing values and encode categorical features; ( 4) standardize numeric features; ( 5) train a simple baseline (e.g., logistic regression) and at least one stronger model (e.g., gradient boosting or a small neural network); ( 6) tune key hyperparameters with cross-validation; ( 7) report accuracy, precision, recall, and ROC-AUC on validation and test; ( 8) persist the trained model and preprocessing steps; ( 9) implement batch inference via a predict(input_csv_path, output_csv_path) function or CLI. If using a neural network, write a correct training loop with optimizer initialization, forward pass, loss computation, backward pass, and an explicit optimizer step. Briefly explain design choices and how you would productionize this pipeline.

Quick Answer: This question evaluates a candidate's competency in building end-to-end tabular classification pipelines, including data loading and splitting, missing-value handling, categorical encoding, feature scaling, model training and comparison, hyperparameter tuning, metric-based evaluation, model persistence, and batch inference.

Nextdoor logo
Nextdoor
Sep 6, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
9
0

End-to-End Tabular Classification Pipeline (Python)

Context

You are given a tabular dataset in a CSV file and asked to build an end-to-end machine learning pipeline for a classification problem. Assume the dataset contains a column named target (binary classification by default). You may extend to multiclass if desired.

Requirements

  1. Load the data from CSV.
  2. Create stratified train/validation/test splits (e.g., 60/20/20).
  3. Handle missing values and encode categorical features.
  4. Standardize numeric features.
  5. Train a simple baseline model (e.g., Logistic Regression) and at least one stronger model (e.g., Gradient Boosting or a small neural network).
  6. Tune key hyperparameters with cross-validation.
  7. Report accuracy, precision, recall, and ROC-AUC on validation and test sets.
  8. Persist the trained model and preprocessing steps.
  9. Implement batch inference via a predict(input_csv_path, output_csv_path) function or CLI.

If you choose a neural network, include a correct training loop with optimizer initialization, forward pass, loss computation, backward pass, and optimizer step.

Deliverables

  • Clear, well-structured Python code (preferably using scikit-learn for classical models) with docstrings/comments.
  • A short explanation of design choices and how you would productionize this pipeline.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Nextdoor•More Machine Learning Engineer•Nextdoor Machine Learning Engineer•Nextdoor ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.