Build pipeline for imbalanced classification

Q: Build pipeline for imbalanced classification

This question evaluates a candidate's competency in designing and implementing an end-to-end imbalanced classification pipeline, covering numeric and categorical preprocessing, appropriate resampling methods for severe class imbalance, model training, and evaluation via precision, recall, and F1.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Build an Imbalanced Classification Pipeline (scikit-learn + imbalanced-learn)

Context

You are given a tabular dataset with a severely imbalanced binary target (e.g., minority class rate < 5%). Build an end-to-end classification pipeline that:

Applies standard preprocessing to numeric and categorical features.
Uses an appropriate resampling method to address imbalance.
Trains a classifier.
Evaluates precision, recall, and F1-score on a held-out test set.

Assume the input features X are in a pandas DataFrame and the target y is a pandas Series.

Requirements

Split the data into train/test using stratification to preserve class ratios.
Preprocess features:
- Numeric: impute missing values and standardize.
- Categorical: impute missing values and encode safely.
Resample only the training data (avoid leakage) using a suitable method:
- If only numeric features: SMOTE is acceptable.
- If mixed types: use SMOTENC to correctly handle categorical features.
Train a reasonable baseline classifier (e.g., logistic regression or tree-based model).
Report precision, recall, and F1-score on the test set (per-class and macro/weighted averages are acceptable).

Deliverables

Reproducible Python code using scikit-learn and imbalanced-learn that implements the above and prints metrics on the held-out test set.
Brief comments justifying major choices (resampling method, pipeline order).

Build pipeline for imbalanced classification

Build an Imbalanced Classification Pipeline (scikit-learn + imbalanced-learn)

Context

Requirements

Deliverables

Solution

Comments (0)

Build pipeline for imbalanced classification

Overview

Build an Imbalanced Classification Pipeline (scikit-learn + imbalanced-learn)

Context

Requirements

Deliverables

Solution

Comments (0)