Implement monotonic-array linear interpolation
Company: Tradedesk
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
##### Question
Implement a function `interpolate(array_x, array_y, x)` that returns the linearly interpolated y-value for a query x-value, given paired data points.
You are given:
- `array_x`: a **strictly increasing** (monotonic, no duplicates) list/array of numbers, length `n >= 2`
- `array_y`: a list/array of numbers of the **same length** as `array_x`
- `x`: a query number that may be **inside or outside** the range `[array_x[0], array_x[n-1]]`
Assume both arrays fit in memory.
**Tasks**
1. Return the **piecewise-linear interpolation** of the y-value when `x` is within the domain.
- If `x == array_x[i]`, return `array_y[i]` exactly.
- Otherwise, find the two surrounding points `(x0, y0)` and `(x1, y1)` such that `x0 < x < x1` and return the linear interpolation between them.
2. Discuss and/or implement the behavior when `x` is **outside** `[array_x[0], array_x[-1]]`. Consider at least these options:
- **Option A:** raise an error (no extrapolation)
- **Option B:** clamp to the nearest boundary y-value
- **Option C:** extrapolate (extend the nearest segment linearly)
3. Analyze the **time complexity** of your approach and name at least one **alternative to a binary-search (`bisect`)** strategy for locating the interval.
**Follow-ups**
1. Instead of raising an error for out-of-range queries, what other values could be returned (e.g., `None`, `NaN`), and when is each appropriate?
2. In which real-world cases does each out-of-range behavior (error / clamp / extrapolate) make sense? Give pros and cons.
3. For `x = 100` and `array_x = [2, 4, 6, 8, 11]`, what is the safest behavior and why?
4. What edge cases must you handle (very small arrays, exact matches at endpoints or interior, float precision)?
**Constraints**
- Target time complexity: **O(log n)** for in-range queries (preferred).
- Handle edge cases robustly.
Quick Answer: This question evaluates proficiency with piecewise-linear interpolation on strictly increasing (monotonic) arrays, sound handling of out-of-range queries (error, clamp, extrapolate, or None/NaN), and efficient interval search over sorted data with a target of O(log n). It also probes real-world judgment about when each out-of-range policy is appropriate and which non-bisect strategies exist.
Solution
## Core idea
Because `array_x` is strictly increasing, locating the interval that brackets `x` is a search over sorted data, and within that interval the value is a straight-line (linear) interpolation between the two endpoints.
For `x` between consecutive points `(x0, y0)` and `(x1, y1)` with `x0 <= x <= x1`:
```
y = y0 + (y1 - y0) * (x - x0) / (x1 - x0)
```
## In-range implementation (O(log n) with bisect)
Use binary search to find the insertion position, then interpolate. This gives O(log n) per query.
```python
from bisect import bisect_left
def interpolate(array_x, array_y, x, out_of_range="error"):
n = len(array_x)
if n != len(array_y):
raise ValueError("array_x and array_y must have the same length")
if n < 2:
raise ValueError("need at least two points")
lo, hi = array_x[0], array_x[-1]
# --- Out-of-range handling ---
if x < lo or x > hi:
if out_of_range == "error":
raise ValueError(f"x={x} is outside [{lo}, {hi}]")
if out_of_range == "clamp":
return array_y[0] if x < lo else array_y[-1]
if out_of_range == "none":
return None
if out_of_range == "nan":
return float("nan")
if out_of_range == "extrapolate":
if x < lo:
x0, x1, y0, y1 = array_x[0], array_x[1], array_y[0], array_y[1]
else:
x0, x1, y0, y1 = array_x[-2], array_x[-1], array_y[-2], array_y[-1]
return y0 + (y1 - y0) * (x - x0) / (x1 - x0)
raise ValueError(f"unknown out_of_range mode: {out_of_range}")
# --- In-range: locate bracketing interval via binary search ---
i = bisect_left(array_x, x)
if i < n and array_x[i] == x: # exact hit (covers both endpoints and interior)
return array_y[i]
# bisect_left returns the first index with array_x[i] >= x; here array_x[i] > x,
# so the bracket is (i-1, i). i >= 1 is guaranteed because x > array_x[0].
x0, x1 = array_x[i - 1], array_x[i]
y0, y1 = array_y[i - 1], array_y[i]
return y0 + (y1 - y0) * (x - x0) / (x1 - x0)
```
## Out-of-range semantics (Task 2 + follow-ups 1–3)
- **Raise an error (A):** safest default when extrapolation is meaningless or dangerous. Forces the caller to acknowledge the query is outside the measured domain.
- **Clamp (B):** return the nearest endpoint y-value. Reasonable when the quantity is bounded/saturating (e.g., a calibration table where beyond the last measured point the response is known to flatten).
- **Extrapolate (C):** extend the nearest segment linearly. Useful when the underlying relationship is genuinely linear and you trust it beyond the sampled range — but it is the riskiest because error grows with distance and the linear trend may not hold.
- **Return `None` / `NaN` instead of raising:** appropriate when out-of-range is expected and frequent and you want to keep going rather than throw. `NaN` is convenient in numeric/array pipelines (e.g., NumPy/pandas) because it propagates and is easy to mask later; `None` suits plain Python control flow where you branch on a sentinel. Use these over an exception when a missing/undefined result is a normal outcome, not an error.
- **`x = 100` with `array_x = [2, 4, 6, 8, 11]`:** 100 is far outside `[2, 11]`. The **safest behavior is to raise an error** (or return `NaN`/`None`). Linear extrapolation that far past the last point (from segment 8→11) has no support in the data and would produce a wildly unreliable value; clamping to `array_y[-1]` silently hides that the query is out of domain. Failing loudly is preferable unless you have a documented reason to extrapolate.
## Complexity and an alternative to binary search (Task 3)
- The bisect approach is **O(log n)** per query (binary search) plus O(1) arithmetic; O(1) extra space.
- **Alternatives to `bisect`:**
- **Linear scan** of `array_x` to find the bracketing interval — O(n) per query; simplest, fine for tiny arrays.
- **Vectorized search**, e.g. `numpy.searchsorted` (binary search under the hood) or `numpy.interp` directly — efficient when interpolating many query points at once.
- **Hashing / direct indexing** when `array_x` is on a known regular grid: compute the interval index arithmetically as `i = floor((x - x0) / step)` in O(1), no search needed.
- **Interpolation search**: estimate the position from the value distribution; ~O(log log n) on uniformly distributed keys (worse case O(n)).
## Edge cases (follow-up 4)
- **Length mismatch** between `array_x` and `array_y` → validate and raise.
- **n < 2** → cannot interpolate; raise.
- **Exact matches** at endpoints and interior points → return the stored y exactly (the `array_x[i] == x` check), avoiding needless float arithmetic.
- **Float precision:** exact equality on floats is fragile; if inputs are noisy floats, consider a tolerance (`abs(array_x[i] - x) <= eps`) for the exact-match check. The interpolation formula uses `(x1 - x0)` in the denominator — guaranteed nonzero here because x-values are strictly increasing (no duplicates).
- **Querying exactly at a boundary** (`x == array_x[0]` or `x == array_x[-1]`) must be treated as in-range, not out-of-range.
Explanation
Two responsibilities: (1) locate the interval that brackets x in a sorted array — O(log n) via binary search (bisect/searchsorted), or O(1) on a regular grid, or O(n) linear scan; (2) apply the straight-line formula y = y0 + (y1-y0)*(x-x0)/(x1-x0). The judgment content is the out-of-range policy (error vs clamp vs extrapolate vs None/NaN) and knowing that far-out queries like x=100 on [2,11] should fail loudly rather than be silently extrapolated or clamped.