Implement Euclidean distance and k-nearest neighbors
Company: Pitchbook
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Onsite
You are working with points in an n-dimensional Euclidean space, where each point is represented as a list (or array) of numeric coordinates.
Assume:
- Each point is represented as `list[float]`.
- For any given operation, all points involved have the same dimensionality (same length of the list).
- For the k-nearest-neighbors function, assume `1 <= k <= len(dataset)`.
- If two points are at exactly the same distance from the query point, break ties by smaller index in the dataset.
---
### 1) Compute Euclidean distance between two n-dimensional points
Write a function that computes the Euclidean distance between two points in n-dimensional space.
The function should:
- Accept two points as input, e.g. `p: list[float]` and `q: list[float]`.
- Handle points of arbitrary dimensions (as long as both points have the same dimension).
- Return the Euclidean distance as a `float`.
The Euclidean distance between points `p` and `q` in n dimensions is:
$$
\text{distance}(p, q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}
$$
**Examples:**
- Distance between `[1, 2]` and `[4, 6]` in 2D space.
- Distance between `[1, 2, 3]` and `[4, 5, 6]` in 3D space.
You do not need to print anything; just implement the function that returns the distance.
---
### 2) Find k-nearest neighbors using Euclidean distance
Using the Euclidean distance function from part (1) as the distance metric, write a function to find the k-nearest neighbors of a query point within a dataset.
The function should:
- Accept as input:
- `query_vector: list[float]` — the query point.
- `k: int` — the number of nearest neighbors to return.
- `dataset: list[list[float]]` — a list of data points, where each data point is a list of floats of the same dimension as `query_vector`.
- Compute the Euclidean distance from the `query_vector` to each point in `dataset`.
- Return a `list[int]` of the indices of the **k nearest neighbors** in `dataset`, **sorted in order of increasing distance** to the query.
- If two points have the same distance, return the point with the smaller index first.
**Example dataset:**
```python
dataset = [
[10, 11, 12], # index 0
[1, 2, 3], # index 1
[4, 5, 6], # index 2
[7, 8, 9], # index 3
[13, 14, 15] # index 4
]
```
Given some `query_vector` and a value of `k`, your function should return the indices of the k closest points from this dataset, ordered from closest to farthest.
Quick Answer: This question evaluates understanding of Euclidean distance metrics, vector arithmetic, and nearest-neighbor retrieval, testing implementation skills for computing distances and selecting the k closest points.