PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of k-means clustering concepts (k-means++ initialization, Lloyd iterations, and SSE) and proficiency in numerical implementation and vectorized array operations.

  • Medium
  • TikTok
  • Coding & Algorithms
  • Data Scientist

Implement K-means and run two iterations

Company: TikTok

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Given points P={(0,0),(0,2),(2,0),(2,2),(8,8),(8,10),(10,8),(10,10)} and k=2, (1) initialize centroids with k-means++ using seed=42 and Euclidean distance; show the sampled centers. (2) Perform exactly two Lloyd iterations (assign → update) by hand, breaking ties by choosing the lower-index centroid; list cluster memberships and centroids after each iteration. (3) Compute SSE after the second iteration. (4) Describe how you would implement this efficiently in vectorized NumPy (no loops over points), including the time complexity per iteration, and explain how you would handle empty clusters robustly.

Quick Answer: This question evaluates understanding of k-means clustering concepts (k-means++ initialization, Lloyd iterations, and SSE) and proficiency in numerical implementation and vectorized array operations.

Run deterministic k-means++ with random.Random(seed), then the requested number of Lloyd iterations, returning clusters, centroids, and SSE.

Constraints

  • Ties in assignment choose the lower-index centroid.
  • Empty clusters keep their previous centroid.
  • Centroids and SSE are rounded to 6 decimals for exact grading.

Examples

Input: ([(0,0),(0,2),(2,0),(2,2),(8,8),(8,10),(10,8),(10,10)], 2, 42, 2)

Expected Output: {'initial_centers': [[0, 2], [2, 2]], 'iterations': [{'clusters': [[[0, 0], [0, 2]], [[2, 0], [2, 2], [8, 8], [8, 10], [10, 8], [10, 10]]], 'centroids': [[0.0, 1.0], [6.666667, 6.333333]]}, {'clusters': [[[0, 0], [0, 2], [2, 0], [2, 2]], [[8, 8], [8, 10], [10, 8], [10, 10]]], 'centroids': [[1.0, 1.0], [9.0, 9.0]]}], 'final_centroids': [[1.0, 1.0], [9.0, 9.0]], 'sse': 16.0}

Explanation: Run deterministic k-means++ initialization and two Lloyd iterations.

Input: ([(0,0),(10,0)], 2, 1, 1)

Expected Output: {'initial_centers': [[0, 0], [10, 0]], 'iterations': [{'clusters': [[[0, 0]], [[10, 0]]], 'centroids': [[0.0, 0.0], [10.0, 0.0]]}], 'final_centroids': [[0.0, 0.0], [10.0, 0.0]], 'sse': 0.0}

Explanation: Two points and two clusters stay separated.

Hints

  1. Clarify edge cases before coding.
  2. Keep the return value deterministic.
Last updated: Jun 27, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Parse a nested list from a string - TikTok (medium)
  • Implement stacks, streaming median, and upward path sum - TikTok (easy)
  • Implement stack variants and path-sum check - TikTok (medium)
  • Find the longest palindromic substring - TikTok (easy)
  • Solve common string/DP/stack problems - TikTok (medium)