You are given a training dataset labeled by human annotators, but some annotations are low quality, inconsistent, rushed, adversarial, or simply wrong.
Design a practical method to identify and filter bad annotations before using the data for model training. Your approach should work in a real production setting rather than only in a clean academic setup.
Discuss:
-
What signals you would use at the example level and annotator level
-
How you would distinguish hard examples from bad labels
-
Whether you would remove, relabel, or down-weight suspicious data
-
How you would evaluate the filtering system
-
What failure modes and fairness risks you would watch for
If useful, describe how you would implement a scoring pipeline that assigns a quality score to each annotation.