PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Waymo

Compare two rare-event detection models statistically

Last updated: Apr 10, 2026

Quick Overview

This question evaluates a candidate's competency in statistical model evaluation for rare-event detection, covering selection of appropriate metrics for imbalanced data, estimation of confidence intervals and hypothesis testing with small samples, and considerations such as calibration, thresholding, paired versus unpaired comparisons, and cost/alert-budget trade-offs. Commonly asked in Machine Learning and data-science interviews, it tests both conceptual understanding of statistical assumptions and practical application of limited aggregated results to quantify uncertainty and support decision-making, and is categorized under Machine Learning and statistical inference with emphasis on both conceptual and practical levels of abstraction.

  • easy
  • Waymo
  • Machine Learning
  • Data Scientist

Compare two rare-event detection models statistically

Company: Waymo

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Onsite

You are evaluating two models (Model A and Model B) for **rare-event detection** (e.g., fraud, abuse, medical adverse event). Positives are extremely rare. You are given only limited evaluation results (e.g., a small number of aggregated counts such as TP/FP/FN/TN, or precision/recall at a chosen threshold) for each model—assume you have enough information to derive confusion-matrix counts, but the number of positives is small. ### Questions 1. Which metrics are most appropriate for comparing the models in a rare-event setting (and why not accuracy/ROC-AUC alone)? 2. How would you compare Model A vs Model B **with statistical uncertainty**? - Write down the relevant distributions/assumptions (e.g., binomial) and how you’d compute confidence intervals. - How would you test whether one model is significantly better? 3. If you only have “a few numbers” (small sample), what would you do to make a decision responsibly? Include thresholds, calibration, and cost/alert-budget considerations. State your assumptions (paired vs unpaired evaluation, fixed threshold vs full curve, etc.).

Quick Answer: This question evaluates a candidate's competency in statistical model evaluation for rare-event detection, covering selection of appropriate metrics for imbalanced data, estimation of confidence intervals and hypothesis testing with small samples, and considerations such as calibration, thresholding, paired versus unpaired comparisons, and cost/alert-budget trade-offs. Commonly asked in Machine Learning and data-science interviews, it tests both conceptual understanding of statistical assumptions and practical application of limited aggregated results to quantify uncertainty and support decision-making, and is categorized under Machine Learning and statistical inference with emphasis on both conceptual and practical levels of abstraction.

Related Interview Questions

  • Design an Online Experiment - Waymo (medium)
  • How predict vehicles’ turn direction at intersection? - Waymo (easy)
  • Implement K-means and handle train-inference mismatch - Waymo (easy)
Waymo logo
Waymo
Jan 17, 2026, 12:00 AM
Data Scientist
Onsite
Machine Learning
39
0

You are evaluating two models (Model A and Model B) for rare-event detection (e.g., fraud, abuse, medical adverse event). Positives are extremely rare.

You are given only limited evaluation results (e.g., a small number of aggregated counts such as TP/FP/FN/TN, or precision/recall at a chosen threshold) for each model—assume you have enough information to derive confusion-matrix counts, but the number of positives is small.

Questions

  1. Which metrics are most appropriate for comparing the models in a rare-event setting (and why not accuracy/ROC-AUC alone)?
  2. How would you compare Model A vs Model B with statistical uncertainty ?
    • Write down the relevant distributions/assumptions (e.g., binomial) and how you’d compute confidence intervals.
    • How would you test whether one model is significantly better?
  3. If you only have “a few numbers” (small sample), what would you do to make a decision responsibly? Include thresholds, calibration, and cost/alert-budget considerations.

State your assumptions (paired vs unpaired evaluation, fixed threshold vs full curve, etc.).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Waymo•More Data Scientist•Waymo Data Scientist•Waymo Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.