Build a defensible ML pipeline end-to-end

Q: Build a defensible ML pipeline end-to-end

This is a Machine Learning interview question from Thumbtack for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

End-to-End Binary Classification Pipeline on Tabular Data (Numeric, Categorical, Text)

Context

You are handed a tabular dataset that includes numerical features, categorical features (some high-cardinality), and short free-text fields, plus a binary target. Observations have timestamps. The business will act on the model by ranking or thresholding scores (e.g., contact, route, approve) with a limited budget. Positives may be rare. Stakeholders care about stable lift, calibrated probabilities, and fairness across key segments such as region and job_category.

Task

Design a production-ready modeling pipeline that you can defend during an onsite interview. Cover the following:

Business Objective, Optimization Metric, and Decision Threshold
- State a concrete business decision the model supports.
- Choose an optimization metric appropriate for rare positives (e.g., PR-AUC) and specify any secondary metrics.
- Define how you will set a decision threshold (or top-K) tied to costs/lift.
Data Splitting Strategy
- Use time-based splits if temporal; otherwise stratified/grouped splits.
- Incorporate nested cross-validation (outer for unbiased evaluation, inner for tuning).
Preprocessing
- Imputation plans for numeric/categorical/text; add missingness indicators where appropriate.
- Leakage checks tied to timestamps and label windows.
- Rare-category handling and high-cardinality encoding (e.g., out-of-fold target encoding with smoothing).
- Text feature extraction approach.
Modeling and Tuning
- Train at least two model families (e.g., Elastic Net Logistic Regression and Gradient Boosting Trees).
- Perform hyperparameter search within the inner CV loop.
- Compare models using calibrated probabilities.
Evaluation: Stability, Fairness, and Calibration
- Assess temporal stability and confidence intervals.
- Evaluate fairness across regions and job_category (group metrics and disparities).
- Evaluate calibration (global and per-segment).
Explainability and Production Monitoring
- Produce model-agnostic feature importance.
- Define a monitoring plan for data drift (e.g., PSI), performance drift, and threshold re-tuning.

Be explicit about assumptions and how you would validate each step. Keep the design actionable and defensible.

Build a defensible ML pipeline end-to-end

End-to-End Binary Classification Pipeline on Tabular Data (Numeric, Categorical, Text)

Context

Task

Solution (Locked)

Comments (0)