Two-Annotator Labeling Policy: Precision, Recall, F1, and Generalization
You have two independent annotators who review videos and label them as "illegal" or "legal."
-
Sensitivity (true positive rate per annotator): s = 0.80
-
False positive rate per annotator: f = 0.02 (so specificity = 0.98)
-
Base rate (prevalence of illegal videos): p = 0.01
Policies to evaluate:
-
At-least-one (OR): Flag a video as illegal if at least one annotator says "illegal."
-
Both (AND): Flag a video as illegal only if both annotators say "illegal."
Tasks:
-
For the OR policy, compute precision (PPV) and recall (TPR).
-
For the AND policy, compute PPV and TPR.
-
Compute the F1 score for each and state which policy has the higher F1. Show your calculations.
-
Generalize formulas for n annotators with majority vote (threshold t = ceil((n+1)/2)) and arbitrary p, s, f.