Implement K-means clustering from scratch

Q: Implement K-means clustering from scratch

This is a Coding & Algorithms interview question from Microsoft for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Loading...

Task: Implement K-means clustering (from scratch)

Write a function to perform K-means clustering on a set of points.

Input

A dataset X with shape (n_samples, n_features) (e.g., a NumPy array or PyTorch tensor).
An integer k = number of clusters.
Optional parameters:
- max_iters (e.g., 100)
- tol for convergence (e.g., 1e-4)
- Initialization method (e.g., random points from X )

Output

Return:

centroids : shape (k, n_features)
labels : length n_samples , where each label is in [0, k-1]

Requirements / Discussion Points

Describe and implement the two core steps iteratively:
1. Assignment step : assign each point to the nearest centroid (e.g., Euclidean distance).
2. Update step : recompute each centroid as the mean of assigned points.
Define a stopping condition (e.g., centroid shift < tol or max_iters reached).
Handle edge cases (at least verbally), e.g.:
- A cluster gets no points in an iteration (empty cluster).
- k > n_samples .
You may use Python with NumPy and/or PyTorch , but do not call a library K-means implementation.

(In the interview, correctness and reasoning matter more than making it fully runnable.)

Implement K-means clustering from scratch

Task: Implement K-means clustering (from scratch)

Input

Output

Requirements / Discussion Points

Comments (0)