PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Statistics & Math/Google

Estimate percentile from buckets

Last updated: Mar 29, 2026

Quick Overview

This question evaluates statistical estimation skills, understanding of histogram-based quantile approximation, and the ability to reason about aggregated data distributions and interpolation under limited information.

  • medium
  • Google
  • Statistics & Math
  • Data Scientist

Estimate percentile from buckets

Company: Google

Role: Data Scientist

Category: Statistics & Math

Difficulty: medium

Interview Round: Technical Screen

You are given an approximate histogram of search-query frequencies. Each bucket `i` is represented as `(left_bd_i, right_bd_i, bucket_count_i)`, where: - `left_bd_i` and `right_bd_i` are the lower and upper boundaries of the bucket, - `bucket_count_i` is the number of queries whose true `search_count` falls in that bucket. Assume there are `K` non-overlapping buckets sorted by boundary, and you do not have access to the raw per-query `search_count` values. How would you estimate the `n`th percentile of the underlying `search_count` distribution? Your answer should address: 1. How to identify which bucket contains the desired percentile. 2. Why simply returning the midpoint of that bucket can be a poor estimate. 3. How to improve the estimate using interpolation within the bucket. 4. What assumptions are required for that interpolation to be reasonable. 5. How to handle edge cases such as empty buckets, percentiles near 0 or 100, and coarse or uneven bucket widths.

Quick Answer: This question evaluates statistical estimation skills, understanding of histogram-based quantile approximation, and the ability to reason about aggregated data distributions and interpolation under limited information.

Related Interview Questions

  • Measure Bird Species Segregation - Google (medium)
  • Estimate weather’s effect on mental health - Google (easy)
  • Explain Bootstrap and Statistical Inference - Google (hard)
  • Explain Bootstrap and Prove Uniformity - Google (hard)
  • Can bootstrap help reduce variance - Google (medium)
Google logo
Google
Feb 5, 2025, 12:00 AM
Data Scientist
Technical Screen
Statistics & Math
5
0

You are given an approximate histogram of search-query frequencies. Each bucket i is represented as (left_bd_i, right_bd_i, bucket_count_i), where:

  • left_bd_i and right_bd_i are the lower and upper boundaries of the bucket,
  • bucket_count_i is the number of queries whose true search_count falls in that bucket.

Assume there are K non-overlapping buckets sorted by boundary, and you do not have access to the raw per-query search_count values.

How would you estimate the nth percentile of the underlying search_count distribution?

Your answer should address:

  1. How to identify which bucket contains the desired percentile.
  2. Why simply returning the midpoint of that bucket can be a poor estimate.
  3. How to improve the estimate using interpolation within the bucket.
  4. What assumptions are required for that interpolation to be reasonable.
  5. How to handle edge cases such as empty buckets, percentiles near 0 or 100, and coarse or uneven bucket widths.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Google•More Data Scientist•Google Data Scientist•Google Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.