PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Statistics & Math/Google

Estimate population singletons from a 10% log

Last updated: Apr 28, 2026

Quick Overview

This question evaluates mastery of statistical estimation and sampling theory, including recognition of bias from subsampling, frequency‑of‑frequencies modeling for rare events, uncertainty quantification under heavy‑tailed counts, and design of simulation studies.

  • Medium
  • Google
  • Statistics & Math
  • Data Scientist

Estimate population singletons from a 10% log

Company: Google

Role: Data Scientist

Category: Statistics & Math

Difficulty: Medium

Interview Round: Technical Screen

A daily search log has one row per query string. You draw a 10% simple random sample of rows without replacement. Define a “unique query” (singleton) as a query appearing exactly once in the full day’s log. a) Explain why estimating the number of singletons by counting singletons in the 10% sample and multiplying by 10 is biased; determine the bias direction and give intuition. b) Derive a better estimator using a frequency‑of‑frequencies model: relate sampled counts f_k to population counts F_k under binomial thinning, and propose a Poisson/negative‑binomial mixture or Good–Turing/Chao‑type estimator for F_1. c) Outline how you would compute standard errors (delta method, bootstrap) and diagnose model misspecification under heavy‑tailed query frequencies. d) Describe a simulation plan to compare estimators across realistic traffic distributions.

Quick Answer: This question evaluates mastery of statistical estimation and sampling theory, including recognition of bias from subsampling, frequency‑of‑frequencies modeling for rare events, uncertainty quantification under heavy‑tailed counts, and design of simulation studies.

Related Interview Questions

  • Estimate weather’s effect on mental health - Google (easy)
  • Explain Bootstrap and Statistical Inference - Google (hard)
  • Explain Bootstrap and Prove Uniformity - Google (hard)
  • Can bootstrap help reduce variance - Google (medium)
  • Compute precision under noisy annotators - Google (medium)
Google logo
Google
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Statistics & Math
14
0

A daily search log has one row per query string. You draw a 10% simple random sample of rows without replacement. Define a “unique query” (singleton) as a query appearing exactly once in the full day’s log. a) Explain why estimating the number of singletons by counting singletons in the 10% sample and multiplying by 10 is biased; determine the bias direction and give intuition. b) Derive a better estimator using a frequency‑of‑frequencies model: relate sampled counts f_k to population counts F_k under binomial thinning, and propose a Poisson/negative‑binomial mixture or Good–Turing/Chao‑type estimator for F_1. c) Outline how you would compute standard errors (delta method, bootstrap) and diagnose model misspecification under heavy‑tailed query frequencies. d) Describe a simulation plan to compare estimators across realistic traffic distributions.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Google•More Data Scientist•Google Data Scientist•Google Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.