How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at TikTok.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at TikTok during technical interviews.

Implement K-means and run two iterations

Quick Overview

This question evaluates understanding of k-means clustering concepts (k-means++ initialization, Lloyd iterations, and SSE) and proficiency in numerical implementation and vectorized array operations.

Implement K-means and run two iterations

Company: TikTok

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Given points P={(0,0),(0,2),(2,0),(2,2),(8,8),(8,10),(10,8),(10,10)} and k=2, (1) initialize centroids with k-means++ using seed=42 and Euclidean distance; show the sampled centers. (2) Perform exactly two Lloyd iterations (assign → update) by hand, breaking ties by choosing the lower-index centroid; list cluster memberships and centroids after each iteration. (3) Compute SSE after the second iteration. (4) Describe how you would implement this efficiently in vectorized NumPy (no loops over points), including the time complexity per iteration, and explain how you would handle empty clusters robustly.

Quick Answer: This question evaluates understanding of k-means clustering concepts (k-means++ initialization, Lloyd iterations, and SSE) and proficiency in numerical implementation and vectorized array operations.

Run deterministic k-means++ with random.Random(seed), then the requested number of Lloyd iterations, returning clusters, centroids, and SSE.

Constraints

Ties in assignment choose the lower-index centroid.
Empty clusters keep their previous centroid.
Centroids and SSE are rounded to 6 decimals for exact grading.

Examples

Input: ([(0,0),(0,2),(2,0),(2,2),(8,8),(8,10),(10,8),(10,10)], 2, 42, 2)

Expected Output: {'initial_centers': [[0, 2], [2, 2]], 'iterations': [{'clusters': [[[0, 0], [0, 2]], [[2, 0], [2, 2], [8, 8], [8, 10], [10, 8], [10, 10]]], 'centroids': [[0.0, 1.0], [6.666667, 6.333333]]}, {'clusters': [[[0, 0], [0, 2], [2, 0], [2, 2]], [[8, 8], [8, 10], [10, 8], [10, 10]]], 'centroids': [[1.0, 1.0], [9.0, 9.0]]}], 'final_centroids': [[1.0, 1.0], [9.0, 9.0]], 'sse': 16.0}

Explanation: Run deterministic k-means++ initialization and two Lloyd iterations.

Input: ([(0,0),(10,0)], 2, 1, 1)

Expected Output: {'initial_centers': [[0, 0], [10, 0]], 'iterations': [{'clusters': [[[0, 0]], [[10, 0]]], 'centroids': [[0.0, 0.0], [10.0, 0.0]]}], 'final_centroids': [[0.0, 0.0], [10.0, 0.0]], 'sse': 0.0}

Explanation: Two points and two clusters stay separated.

Hints

Clarify edge cases before coding.
Keep the return value deterministic.

Last updated: Jun 27, 2026

Loading coding console...