End-to-End Binary Classification Pipeline on Tabular Data (Numeric, Categorical, Text)
Context
You are handed a tabular dataset that includes numerical features, categorical features (some high-cardinality), and short free-text fields, plus a binary target. Observations have timestamps. The business will act on the model by ranking or thresholding scores (e.g., contact, route, approve) with a limited budget. Positives may be rare. Stakeholders care about stable lift, calibrated probabilities, and fairness across key segments such as region and job_category.
Task
Design a production-ready modeling pipeline that you can defend during an onsite interview. Cover the following:
-
Business Objective, Optimization Metric, and Decision Threshold
-
State a concrete business decision the model supports.
-
Choose an optimization metric appropriate for rare positives (e.g., PR-AUC) and specify any secondary metrics.
-
Define how you will set a decision threshold (or top-K) tied to costs/lift.
-
Data Splitting Strategy
-
Use time-based splits if temporal; otherwise stratified/grouped splits.
-
Incorporate nested cross-validation (outer for unbiased evaluation, inner for tuning).
-
Preprocessing
-
Imputation plans for numeric/categorical/text; add missingness indicators where appropriate.
-
Leakage checks tied to timestamps and label windows.
-
Rare-category handling and high-cardinality encoding (e.g., out-of-fold target encoding with smoothing).
-
Text feature extraction approach.
-
Modeling and Tuning
-
Train at least two model families (e.g., Elastic Net Logistic Regression and Gradient Boosting Trees).
-
Perform hyperparameter search within the inner CV loop.
-
Compare models using calibrated probabilities.
-
Evaluation: Stability, Fairness, and Calibration
-
Assess temporal stability and confidence intervals.
-
Evaluate fairness across regions and job_category (group metrics and disparities).
-
Evaluate calibration (global and per-segment).
-
Explainability and Production Monitoring
-
Produce model-agnostic feature importance.
-
Define a monitoring plan for data drift (e.g., PSI), performance drift, and threshold re-tuning.
Be explicit about assumptions and how you would validate each step. Keep the design actionable and defensible.