You are given a histogram-style summary of query popularity. Each bucket is represented as (left_bd, right_bd, count), meaning that count queries have search_count values falling in the interval [left_bd, right_bd) for a fixed analysis window. There are K such buckets, sorted by boundary, non-overlapping, and together they cover the full data range. You do not have the individual query-level search_count values.
Given only these bucket summaries, estimate the nth percentile of search_count.
In your answer:
-
Explain how to identify which bucket contains the percentile.
-
Discuss why simply returning the midpoint
(left_bd + right_bd) / 2
of that bucket can be a poor approximation.
-
Provide a better approximation method using interpolation within the bucket.
-
State any assumptions and edge cases, especially when bucket widths are unequal or the distribution within a bucket may be skewed.