Implement Euclidean distance and k-nearest neighbors
Company: Pitchbook
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
You are working with points in an n-dimensional Euclidean space, where each point is represented as a list (or array) of numeric coordinates.
Assume:
- Each point is represented as `list[float]`.
- For any given operation, all points involved have the same dimensionality (same length of the list).
- For the k-nearest-neighbors function, assume `1 <= k <= len(dataset)`.
- If two points are at exactly the same distance from the query point, break ties by smaller index in the dataset.
---
### 1) Compute Euclidean distance between two n-dimensional points
Write a function that computes the Euclidean distance between two points in n-dimensional space.
The function should:
- Accept two points as input, e.g. `p: list[float]` and `q: list[float]`.
- Handle points of arbitrary dimensions (as long as both points have the same dimension).
- Return the Euclidean distance as a `float`.
The Euclidean distance between points `p` and `q` in n dimensions is:
$$
\text{distance}(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}
$$
**Examples:**
- Distance between `[1, 2]` and `[4, 6]` in 2D space.
- Distance between `[1, 2, 3]` and `[4, 5, 6]` in 3D space.
You do not need to print anything; just implement the function that returns the distance.
---
### 2) Find k-nearest neighbors using Euclidean distance
Using the Euclidean distance function from part (1) as the distance metric, write a function to find the k-nearest neighbors of a query point within a dataset.
The function should:
- Accept as input:
- `query_vector: list[float]` — the query point.
- `k: int` — the number of nearest neighbors to return.
- `dataset: list[list[float]]` — a list of data points, where each data point is a list of floats of the same dimension as `query_vector`.
- Compute the Euclidean distance from the `query_vector` to each point in `dataset`.
- Return a `list[int]` of the indices of the **k nearest neighbors** in `dataset`, **sorted in order of increasing distance** to the query.
- If two points have the same distance, return the point with the smaller index first.
**Example dataset:**
```python
dataset = [
[10, 11, 12], # index 0
[1, 2, 3], # index 1
[4, 5, 6], # index 2
[7, 8, 9], # index 3
[13, 14, 15] # index 4
]
```
Given some `query_vector` and a value of `k`, your function should return the indices of the k closest points from this dataset, ordered from closest to farthest.
Quick Answer: This question evaluates understanding of Euclidean distance metrics, vector arithmetic, and nearest-neighbor retrieval, testing implementation skills for computing distances and selecting the k closest points.
Euclidean Distance
Return the Euclidean distance between two equal-dimensional points, rounded to six decimals.
Constraints
- p and q have the same dimension
Examples
Input: ([1, 2], [4, 6])
Expected Output: 5.0
Explanation: 3-4-5 triangle.
Input: ([1, 2, 3], [4, 5, 6])
Expected Output: 5.196152
Explanation: 3D distance.
Input: ([], [])
Expected Output: 0.0
Explanation: Zero-dimensional points.
Input: ([-1, 0], [1, 0])
Expected Output: 2.0
Explanation: Negative coordinate.
Hints
- Sum squared coordinate differences and take the square root.
k-Nearest Neighbor Indices
Return the indices of the k closest dataset points to the query, breaking distance ties by smaller index.
Examples
Input: ([0, 0], 2, [[1, 0], [0, 2], [0, 1]])
Expected Output: [0, 2]
Explanation: Two closest points.
Input: ([1, 1], 3, [[2, 1], [1, 2], [0, 1]])
Expected Output: [0, 1, 2]
Explanation: Tie by index.
Input: ([1, 2, 3], 1, [[10, 10, 10], [1, 2, 4]])
Expected Output: [1]
Explanation: 3D query.
Hints
- Sort by squared distance and index; square root is unnecessary for ordering.