Implement K-Means and compare with GMM
Company: Tubitv
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Implement K-Means clustering from scratch for a dataset `X` of shape `(n_samples, n_features)` and a target number of clusters `k`.
Your implementation should:
- initialize `k` centroids,
- assign each point to its nearest centroid,
- recompute centroids as the mean of assigned points,
- repeat until convergence or a maximum number of iterations,
- handle edge cases such as empty clusters,
- and include a test or check for convergence, such as unchanged assignments, centroid movement below a tolerance, or non-increasing within-cluster loss.
After coding, answer these follow-up questions:
- How does Gaussian Mixture Modeling differ from K-Means?
- How would you train a Gaussian mixture model using the EM algorithm?
- In what situations would you prefer GMM over K-Means?
Quick Answer: This question evaluates understanding and practical implementation of clustering algorithms, specifically K-Means centroid initialization, assignment and update mechanics, convergence criteria and edge-case handling, along with comparative knowledge of Gaussian Mixture Models and the Expectation-Maximization training framework.