Answer the following two probability/statistics questions.
-
You are given access to a function
rand01()
that returns independent samples from
Uniform(0, 1)
. Write a function
sample_square()
that returns a 2D point
(x, y)
uniformly distributed over the square
(-1, 1) x (-1, 1)
.
-
You are given an aggregated histogram for a numeric variable related to search queries. Each bucket
i
contains:
-
left_bd
(float): left boundary of the bucket
-
right_bd
(float): right boundary of the bucket
-
search_count
(int): number of observations in that bucket
There are K ordered, non-overlapping buckets, and you do not have access to the raw observations. Describe how to estimate the p-th percentile, where p is between 0 and 100. First discuss a naive approach based on the bucket that contains the percentile, then propose a better approximation that uses interpolation within that bucket. State any assumptions your method relies on.