PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Statistics & Math/Google

Design human review to estimate model accuracy

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in statistical experimental design, accuracy estimation under noisy human labels, and reasoning about bias–variance trade-offs when allocating a fixed budget of annotation reviews.

  • Hard
  • Google
  • Statistics & Math
  • Data Scientist

Design human review to estimate model accuracy

Company: Google

Role: Data Scientist

Category: Statistics & Math

Difficulty: Hard

Interview Round: Onsite

You need to estimate the **accuracy** of an ML classifier on a population of subjects. You can only afford **K total human reviews**. Each human review produces a **binary judgment** (0/1) for a subject (assume it is intended to represent the “true label,” but reviewers may be noisy). You must choose how to allocate reviews: - **Option 1:** review **K different subjects once each** (1 review per subject). - **Option 2:** review **fewer subjects**, but assign **multiple independent reviews per subject**, and use **majority vote** (or another aggregation). **Question:** Which option is better for estimating the model’s accuracy, and under what assumptions? Provide a statistical argument, discuss bias/variance trade-offs, and propose a practical review design (including how you would quantify uncertainty with a confidence interval).

Quick Answer: This question evaluates a candidate's competency in statistical experimental design, accuracy estimation under noisy human labels, and reasoning about bias–variance trade-offs when allocating a fixed budget of annotation reviews.

Related Interview Questions

  • Estimate weather’s effect on mental health - Google (easy)
  • Explain Bootstrap and Statistical Inference - Google (hard)
  • Explain Bootstrap and Prove Uniformity - Google (hard)
  • Can bootstrap help reduce variance - Google (medium)
  • Compute precision under noisy annotators - Google (medium)
Google logo
Google
Aug 5, 2025, 12:00 AM
Data Scientist
Onsite
Statistics & Math
1
0

You need to estimate the accuracy of an ML classifier on a population of subjects.

You can only afford K total human reviews. Each human review produces a binary judgment (0/1) for a subject (assume it is intended to represent the “true label,” but reviewers may be noisy).

You must choose how to allocate reviews:

  • Option 1: review K different subjects once each (1 review per subject).
  • Option 2: review fewer subjects , but assign multiple independent reviews per subject , and use majority vote (or another aggregation).

Question: Which option is better for estimating the model’s accuracy, and under what assumptions? Provide a statistical argument, discuss bias/variance trade-offs, and propose a practical review design (including how you would quantify uncertainty with a confidence interval).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Google•More Data Scientist•Google Data Scientist•Google Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.