Design human review to estimate model accuracy

Q: Design human review to estimate model accuracy

This question evaluates a candidate's competency in statistical experimental design, accuracy estimation under noisy human labels, and reasoning about bias–variance trade-offs when allocating a fixed budget of annotation reviews.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

You need to estimate the accuracy of an ML classifier on a population of subjects.

You can only afford K total human reviews. Each human review produces a binary judgment (0/1) for a subject (assume it is intended to represent the “true label,” but reviewers may be noisy).

You must choose how to allocate reviews:

Option 1: review K different subjects once each (1 review per subject).
Option 2: review fewer subjects , but assign multiple independent reviews per subject , and use majority vote (or another aggregation).

Question: Which option is better for estimating the model’s accuracy, and under what assumptions? Provide a statistical argument, discuss bias/variance trade-offs, and propose a practical review design (including how you would quantify uncertainty with a confidence interval).

Design human review to estimate model accuracy

Solution

Comments (0)

Design human review to estimate model accuracy

Overview

Solution

Comments (0)