This question evaluates a candidate's competence in designing an end-to-end machine learning pipeline for binary text classification, covering data understanding and labeling quality, preprocessing, model selection and training, evaluation and thresholding, handling class imbalance and ambiguous labels, and deployment considerations including latency, monitoring, and safety. Commonly asked in the Machine Learning domain, it gauges both practical application skills and conceptual understanding by testing an engineer's ability to balance model performance, evaluation metrics, and operational constraints in real-world NLP and safety-sensitive systems.
You are given a text dataset and asked to build a model that predicts whether a piece of content is harmful (binary classification).