Convert a PDF to a CDF
Company: Uber
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of probability theory, numerical methods, and algorithmic implementation for converting probability density functions into cumulative distribution functions, addressing analytic versus sampled PDFs along with support, normalization, and monotonicity requirements.
Constraints
- 1 <= len(points), len(queries) <= 200000
- `points[i] = (x_i, y_i)` where `x_i` and `y_i` are real numbers
- After clipping negative densities to 0 and merging duplicate x-values, there is at least one positive-width interval and the total integrated area is positive
- |x_i|, |y_i|, |queries[i]| <= 1e9
Examples
Input: ([(0, 0), (1, 1), (2, 0)], [-1, 0, 1, 2, 3])
Expected Output: [0.0, 0.0, 0.5, 1.0, 1.0]
Explanation: This is a symmetric triangular PDF on [0, 2] with total area 1. The CDF at x = 1 is 0.5.
Input: ([(0, 1), (2, 1)], [0, 1, 2])
Expected Output: [0.0, 0.5, 1.0]
Explanation: The PDF is constant on [0, 2], so after normalization the midpoint has cumulative probability 0.5.
Input: ([(2, -1), (0, 1), (1, 3), (1, 2), (3, 1)], [-1, 0.5, 1.5, 2.5, 3])
Expected Output: [0.0, 0.1875, 0.78125, 0.90625, 1.0]
Explanation: After cleaning, the samples become (0,1), (1,3), (2,0), (3,1). Trapezoid areas total 4, so cumulative areas are normalized by 4.
Input: ([(-1, 0), (1, 2), (3, 0)], [-2, 0, 2, 4])
Expected Output: [0.0, 0.125, 0.875, 1.0]
Explanation: This is another triangular PDF, centered at x = 1. Exact trapezoid integration gives the listed values.
Input: ([(0, -2), (1, 0), (2, -1)], [-1, 1, 2, 3])
Expected Output: [0.0, 0.0, 1.0, 1.0]
Explanation: All densities become 0 after cleaning, so the fallback rule applies: the CDF is 0 before max_x and 1 at or after max_x.
Hints
- After sorting by x, think of each neighboring pair of points as one trapezoid under a piecewise-linear PDF.
- Precompute prefix areas, then use binary search to locate the interval containing each query.