End-to-End Binary Classifier Workflow (EDA → Modeling → Fairness → Report)
You are given a labeled tabular dataset and asked to implement a reproducible, end-to-end workflow in Python to analyze the data and train a classifier suitable for deployment.
Assumptions (adapt as needed):
-
Input: a CSV file with a binary target column (e.g., target ∈ {0,1}).
-
Optional columns: a timestamp column for time-based splits; group columns for fairness checks; an ID column to drop.
-
Output: code, metrics, saved model artifact, and a concise text report with a recommended model and expected performance.
Requirements:
-
Data access and schema validation
-
Load data; verify required columns exist; basic type checks and duplicate rows/IDs.
-
Summarize numeric/categorical feature counts and missingness.
-
Exploratory data analysis (EDA)
-
Missing values: counts, percentages, imputation plan.
-
Target leakage checks: suspicious feature names, extremely high target correlation/MI.
-
Class imbalance: distribution and imbalance ratio.
-
Feature distributions: univariate summaries (hist/value counts) and basic outlier flags.
-
Splitting strategy
-
If timestamp present: time-based split (train/validation/test by chronological order).
-
Else: stratified split to preserve class ratio.
-
Baselines
-
Majority-class and simple model baseline (e.g., Logistic Regression with minimal tuning).
-
Preprocessing
-
Numeric: impute (median), scale (standard).
-
Categorical: impute (most frequent), one-hot encode (handle_unknown=ignore); consider rare-category handling.
-
Imbalance handling
-
Use class weights and/or sample weighting; optionally resampling (SMOTE/undersampling) if justified.
-
Model training
-
Train at least two model families (e.g., Logistic Regression and Gradient Boosting).
-
Use cross-validation with hyperparameter tuning (RandomizedSearchCV or equivalent).
-
Metrics aligned to business goal
-
Compute ROC-AUC and PR-AUC; report F1/precision/recall at a chosen threshold.
-
If provided, include cost-sensitive evaluation using FP/FN costs.
-
Calibration (if needed)
-
Assess calibration; calibrate probabilities (Platt or isotonic) if poorly calibrated.
-
Error analysis
-
Confusion matrix and per-slice analysis (e.g., by key categorical/numeric bins).
-
Feature importances (tree-based) and/or coefficients (linear); optionally SHAP.
-
Fairness checks
-
Report metrics by key groups; highlight disparities (e.g., demographic parity, equal opportunity).
-
Reproducibility and report
-
Save the fitted pipeline, metrics JSON, and environment info.
-
Produce a concise recommendation: chosen model, expected performance, and deployment notes.
Deliverables:
-
Python code implementing the above.
-
Saved model artifact and metrics.
-
Short written recommendation with expected performance and guardrails.