PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Tencent

Implement robust k-means with k-means++ initialization

Last updated: Mar 29, 2026

Quick Overview

This question evaluates implementation and understanding of clustering algorithms (k-means with k-means++), vectorized numerical computing with NumPy, handling of edge cases such as empty clusters and sample weighting, and algorithmic complexity analysis in the Coding & Algorithms domain for Data Scientist roles.

  • Medium
  • Tencent
  • Coding & Algorithms
  • Data Scientist

Implement robust k-means with k-means++ initialization

Company: Tencent

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Implement from scratch in Python (no scikit‑learn) a function kmeans(X, k, max_iter=300, tol=1e-4, random_state=0) that returns (centroids, labels). Requirements: a) Use k‑means++ initialization; b) Use vectorized NumPy operations for distance computation; c) Stop early when the maximum centroid shift is < tol; d) Handle empty clusters by re‑seeding to the point with the largest current assignment distance; e) Support an optional sample_weights array that reweights both assignment and centroid updates; f) Ensure deterministic behavior with random_state; g) Analyze time and space complexity in terms of n samples, d dimensions, and k clusters; h) Explain how you would test correctness (e.g., on simple 2D blobs) and diagnose convergence issues (e.g., inertia not decreasing). Provide the function signature, docstring, and well‑commented code.

Quick Answer: This question evaluates implementation and understanding of clustering algorithms (k-means with k-means++), vectorized numerical computing with NumPy, handling of edge cases such as empty clusters and sample weighting, and algorithmic complexity analysis in the Coding & Algorithms domain for Data Scientist roles.

Tencent logo
Tencent
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Coding & Algorithms
5
0

Implement from scratch in Python (no scikit‑learn) a function kmeans(X, k, max_iter=300, tol=1e-4, random_state=0) that returns (centroids, labels). Requirements: a) Use k‑means++ initialization; b) Use vectorized NumPy operations for distance computation; c) Stop early when the maximum centroid shift is < tol; d) Handle empty clusters by re‑seeding to the point with the largest current assignment distance; e) Support an optional sample_weights array that reweights both assignment and centroid updates; f) Ensure deterministic behavior with random_state; g) Analyze time and space complexity in terms of n samples, d dimensions, and k clusters; h) Explain how you would test correctness (e.g., on simple 2D blobs) and diagnose convergence issues (e.g., inertia not decreasing). Provide the function signature, docstring, and well‑commented code.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Tencent•More Data Scientist•Tencent Data Scientist•Tencent Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.