How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Uber.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Uber during technical interviews.

Convert a PDF to a CDF | Uber Coding Question

Quick Overview

This question evaluates understanding of probability theory, numerical methods, and algorithmic implementation for converting probability density functions into cumulative distribution functions, addressing analytic versus sampled PDFs along with support, normalization, and monotonicity requirements.

Convert a PDF to a CDF

Company: Uber

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Given a continuous random variable X with probability density function f(x), write code or describe an algorithm to construct its cumulative distribution function F(x) = P(X <= x). Handle both cases where the PDF is given in closed form and where it is available only as sampled (x, f(x)) points. Explain how you would deal with finite or infinite support, piecewise PDFs, numerical integration, normalization, and how to guarantee that the returned CDF is non-decreasing and approaches 0 and 1 at the boundaries.

Quick Answer: This question evaluates understanding of probability theory, numerical methods, and algorithmic implementation for converting probability density functions into cumulative distribution functions, addressing analytic versus sampled PDFs along with support, normalization, and monotonicity requirements.

You are given sampled values of a continuous probability density function as a list of points `(x, f(x))`. These points may come from breakpoints of a piecewise closed-form PDF or from numeric samples of a PDF whose true support may be finite or truncated from an infinite range. Construct the cumulative distribution function `F(q) = P(X <= q)` and return its value for each query point. Use the following rules: 1. Sort the points by `x`. 2. If the same `x` appears multiple times, keep the largest density value at that `x`. 3. Negative density values are numerical noise and must be treated as `0`. 4. Between consecutive sample points, assume the PDF is linearly interpolated. 5. Integrate the PDF under these line segments exactly (equivalently, use trapezoid areas). 6. If the total area is not `1`, normalize the cumulative areas by the total area. 7. For any query `q <= min_x`, return `0.0`. For any query `q >= max_x`, return `1.0`. 8. The returned CDF values must be rounded to 6 decimal places. This makes the method exact for piecewise-linear PDFs and a numerical approximation for arbitrary sampled PDFs, while ensuring the returned CDF is non-decreasing and reaches `0` and `1` at the boundaries.

Constraints

1 <= len(points), len(queries) <= 200000
`points[i] = (x_i, y_i)` where `x_i` and `y_i` are real numbers
After clipping negative densities to 0 and merging duplicate x-values, there is at least one positive-width interval and the total integrated area is positive
|x_i|, |y_i|, |queries[i]| <= 1e9

Examples

Input: ([(0, 0), (1, 1), (2, 0)], [-1, 0, 1, 2, 3])

Expected Output: [0.0, 0.0, 0.5, 1.0, 1.0]

Explanation: This is a symmetric triangular PDF on [0, 2] with total area 1. The CDF at x = 1 is 0.5.

Input: ([(0, 1), (2, 1)], [0, 1, 2])

Expected Output: [0.0, 0.5, 1.0]

Explanation: The PDF is constant on [0, 2], so after normalization the midpoint has cumulative probability 0.5.

Input: ([(2, -1), (0, 1), (1, 3), (1, 2), (3, 1)], [-1, 0.5, 1.5, 2.5, 3])

Expected Output: [0.0, 0.1875, 0.78125, 0.90625, 1.0]

Explanation: After cleaning, the samples become (0,1), (1,3), (2,0), (3,1). Trapezoid areas total 4, so cumulative areas are normalized by 4.

Input: ([(-1, 0), (1, 2), (3, 0)], [-2, 0, 2, 4])

Expected Output: [0.0, 0.125, 0.875, 1.0]

Explanation: This is another triangular PDF, centered at x = 1. Exact trapezoid integration gives the listed values.

Input: ([(0, -2), (1, 0), (2, -1)], [-1, 1, 2, 3])

Expected Output: [0.0, 0.0, 1.0, 1.0]

Explanation: All densities become 0 after cleaning, so the fallback rule applies: the CDF is 0 before max_x and 1 at or after max_x.

Hints

After sorting by x, think of each neighboring pair of points as one trapezoid under a piecewise-linear PDF.
Precompute prefix areas, then use binary search to locate the interval containing each query.

Quick Overview