PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates understanding of probability theory, numerical methods, and algorithmic implementation for converting probability density functions into cumulative distribution functions, addressing analytic versus sampled PDFs along with support, normalization, and monotonicity requirements.

  • medium
  • Uber
  • Coding & Algorithms
  • Data Scientist

Convert a PDF to a CDF

Company: Uber

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Given a continuous random variable X with probability density function f(x), write code or describe an algorithm to construct its cumulative distribution function F(x) = P(X <= x). Handle both cases where the PDF is given in closed form and where it is available only as sampled (x, f(x)) points. Explain how you would deal with finite or infinite support, piecewise PDFs, numerical integration, normalization, and how to guarantee that the returned CDF is non-decreasing and approaches 0 and 1 at the boundaries.

Quick Answer: This question evaluates understanding of probability theory, numerical methods, and algorithmic implementation for converting probability density functions into cumulative distribution functions, addressing analytic versus sampled PDFs along with support, normalization, and monotonicity requirements.

You are given sampled values of a continuous probability density function as a list of points `(x, f(x))`. These points may come from breakpoints of a piecewise closed-form PDF or from numeric samples of a PDF whose true support may be finite or truncated from an infinite range. Construct the cumulative distribution function `F(q) = P(X <= q)` and return its value for each query point. Use the following rules: 1. Sort the points by `x`. 2. If the same `x` appears multiple times, keep the largest density value at that `x`. 3. Negative density values are numerical noise and must be treated as `0`. 4. Between consecutive sample points, assume the PDF is linearly interpolated. 5. Integrate the PDF under these line segments exactly (equivalently, use trapezoid areas). 6. If the total area is not `1`, normalize the cumulative areas by the total area. 7. For any query `q <= min_x`, return `0.0`. For any query `q >= max_x`, return `1.0`. 8. The returned CDF values must be rounded to 6 decimal places. This makes the method exact for piecewise-linear PDFs and a numerical approximation for arbitrary sampled PDFs, while ensuring the returned CDF is non-decreasing and reaches `0` and `1` at the boundaries.

Constraints

  • 1 <= len(points), len(queries) <= 200000
  • `points[i] = (x_i, y_i)` where `x_i` and `y_i` are real numbers
  • After clipping negative densities to 0 and merging duplicate x-values, there is at least one positive-width interval and the total integrated area is positive
  • |x_i|, |y_i|, |queries[i]| <= 1e9

Examples

Input: ([(0, 0), (1, 1), (2, 0)], [-1, 0, 1, 2, 3])

Expected Output: [0.0, 0.0, 0.5, 1.0, 1.0]

Explanation: This is a symmetric triangular PDF on [0, 2] with total area 1. The CDF at x = 1 is 0.5.

Input: ([(0, 1), (2, 1)], [0, 1, 2])

Expected Output: [0.0, 0.5, 1.0]

Explanation: The PDF is constant on [0, 2], so after normalization the midpoint has cumulative probability 0.5.

Input: ([(2, -1), (0, 1), (1, 3), (1, 2), (3, 1)], [-1, 0.5, 1.5, 2.5, 3])

Expected Output: [0.0, 0.1875, 0.78125, 0.90625, 1.0]

Explanation: After cleaning, the samples become (0,1), (1,3), (2,0), (3,1). Trapezoid areas total 4, so cumulative areas are normalized by 4.

Input: ([(-1, 0), (1, 2), (3, 0)], [-2, 0, 2, 4])

Expected Output: [0.0, 0.125, 0.875, 1.0]

Explanation: This is another triangular PDF, centered at x = 1. Exact trapezoid integration gives the listed values.

Input: ([(0, -2), (1, 0), (2, -1)], [-1, 1, 2, 3])

Expected Output: [0.0, 0.0, 1.0, 1.0]

Explanation: All densities become 0 after cleaning, so the fallback rule applies: the CDF is 0 before max_x and 1 at or after max_x.

Hints

  1. After sorting by x, think of each neighboring pair of points as one trapezoid under a piecewise-linear PDF.
  2. Precompute prefix areas, then use binary search to locate the interval containing each query.
Last updated: Apr 30, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Minesweeper and Word Search - Uber (medium)
  • Implement Store Autocomplete - Uber (medium)
  • Implement Cache Eviction And Seat Assignment - Uber (medium)
  • Schedule Non-Overlapping Meetings Efficiently - Uber (hard)
  • Evaluate an Arithmetic Expression - Uber