Debug and Fix a Transformer Text Classifier, Then Train and Evaluate It
Context
You inherit a small codebase for a transformer-based text classifier. There are four failing unit tests: two correspond to previously documented ("known") issues; two are unexpected ("novel"). Your task is to make the model train and evaluate correctly, and to demonstrate a robust training/evaluation pipeline on a labeled dataset.
Assumptions (to make the task self-contained):
-
Language: Python 3.10+
-
Libraries: PyTorch, Hugging Face Transformers, scikit-learn, pandas, numpy
-
Dataset: a CSV file with columns
text
(string) and
label
(int), single-label classification with K classes.
Tasks
-
Identify the root cause of each failing test (2 known bugs, 2 novel bugs), and fix the model/training code so all tests pass.
-
Provide a clean, minimal reference implementation of the model and training loop that avoids these bugs.
-
Given a labeled dataset, analyze class balance and basic feature distributions (e.g., text length, token frequency), then train the classifier.
-
Report key performance metrics (accuracy, precision, recall, F1; ROC-AUC when binary), and include guardrails for class imbalance and reproducibility.