Improve classifier with noisy multi-annotator labels

Q: Improve classifier with noisy multi-annotator labels

This is a Machine Learning interview question from OpenAI for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Loading...

Problem

You are given a text dataset for a binary classification task (label in {0,1}). Each example has been labeled by multiple human annotators, and annotators often disagree (i.e., the same item can have conflicting labels).

You need to:

Perform a dataset/label analysis to understand the disagreement and likely label noise.
Propose a training and evaluation approach that improves offline metrics (e.g., F1 / AUC / accuracy), given the noisy multi-annotator labels.

Assumptions you may make (state them clearly)

You have access to: raw text, per-annotator labels, annotator IDs, and timestamps.
You can retrain models and change the labeling aggregation strategy, but you may have limited or no ability to collect new labels.

Deliverables

What analyses would you run and what would you look for?
How would you construct train/validation/test splits to avoid misleading offline metrics?
How would you convert multi-annotator labels into training targets?
What model/loss/thresholding/calibration choices would you try, and why?
What failure modes and edge cases could cause offline metric gains to be illusory?

Improve classifier with noisy multi-annotator labels

Problem

Assumptions you may make (state them clearly)

Deliverables

Solution

Comments (0)