Implement K-means and run two iterations
Company: TikTok
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of k-means clustering concepts (k-means++ initialization, Lloyd iterations, and SSE) and proficiency in numerical implementation and vectorized array operations.
Constraints
- Ties in assignment choose the lower-index centroid.
- Empty clusters keep their previous centroid.
- Centroids and SSE are rounded to 6 decimals for exact grading.
Examples
Input: ([(0,0),(0,2),(2,0),(2,2),(8,8),(8,10),(10,8),(10,10)], 2, 42, 2)
Expected Output: {'initial_centers': [[0, 2], [2, 2]], 'iterations': [{'clusters': [[[0, 0], [0, 2]], [[2, 0], [2, 2], [8, 8], [8, 10], [10, 8], [10, 10]]], 'centroids': [[0.0, 1.0], [6.666667, 6.333333]]}, {'clusters': [[[0, 0], [0, 2], [2, 0], [2, 2]], [[8, 8], [8, 10], [10, 8], [10, 10]]], 'centroids': [[1.0, 1.0], [9.0, 9.0]]}], 'final_centroids': [[1.0, 1.0], [9.0, 9.0]], 'sse': 16.0}
Explanation: Run deterministic k-means++ initialization and two Lloyd iterations.
Input: ([(0,0),(10,0)], 2, 1, 1)
Expected Output: {'initial_centers': [[0, 0], [10, 0]], 'iterations': [{'clusters': [[[0, 0]], [[10, 0]]], 'centroids': [[0.0, 0.0], [10.0, 0.0]]}], 'final_centroids': [[0.0, 0.0], [10.0, 0.0]], 'sse': 0.0}
Explanation: Two points and two clusters stay separated.
Hints
- Clarify edge cases before coding.
- Keep the return value deterministic.