Task: Implement K-means clustering (from scratch)
Write a function to perform K-means clustering on a set of points.
Input
-
A dataset
X
with shape
(n_samples, n_features)
(e.g., a NumPy array or PyTorch tensor).
-
An integer
k
= number of clusters.
-
Optional parameters:
-
max_iters
(e.g., 100)
-
tol
for convergence (e.g., 1e-4)
-
Initialization method (e.g., random points from
X
)
Output
Return:
-
centroids
: shape
(k, n_features)
-
labels
: length
n_samples
, where each label is in
[0, k-1]
Requirements / Discussion Points
-
Describe and implement the two core steps iteratively:
-
Assignment step
: assign each point to the nearest centroid (e.g., Euclidean distance).
-
Update step
: recompute each centroid as the mean of assigned points.
-
Define a
stopping condition
(e.g., centroid shift <
tol
or
max_iters
reached).
-
Handle edge cases (at least verbally), e.g.:
-
A cluster gets
no points
in an iteration (empty cluster).
-
k > n_samples
.
-
You may use
Python with NumPy and/or PyTorch
, but do not call a library K-means implementation.
(In the interview, correctness and reasoning matter more than making it fully runnable.)