PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Other

Evaluate and select K in K-means

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in K-means clustering, covering core algorithmic assumptions, initialization effects, methods for selecting K, preprocessing needs for scaling and outliers, and business-focused post hoc validation of segments.

  • medium
  • Other
  • Machine Learning
  • Data Scientist

Evaluate and select K in K-means

Company: Other

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

Explain K-means and its assumptions. (a) Compare random initialization vs k-means++ and their impact on convergence. (b) Provide two methods to choose K (silhouette, elbow, BIC) and explain failure modes under non-spherical clusters or different densities. (c) Given feature scaling issues and outliers, propose preprocessing steps. (d) Describe how you would evaluate cluster usefulness for a marketing segmentation problem with business-oriented post hoc validation.

Quick Answer: This question evaluates a data scientist's competency in K-means clustering, covering core algorithmic assumptions, initialization effects, methods for selecting K, preprocessing needs for scaling and outliers, and business-focused post hoc validation of segments.

Related Interview Questions

  • Derive and regularize logistic regression - Other (hard)
  • Design anomaly detection and handle imbalanced logistic regression - Other (Medium)
  • Extract companies from noisy text - Other (hard)
  • Explain SVM kernels and complexity - Other (hard)
  • Compare trees, RF, and gradient boosting - Other (medium)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
1
0

K-means Clustering: Concepts, Initialization, Model Selection, Preprocessing, and Business Validation

Context: You are clustering customer data with numeric features (e.g., RFM, engagement, product usage) to build marketing segments. Assume standard K-means (Euclidean distance) unless noted.

  1. Explain K-means and its core assumptions.

(a) Compare random initialization vs. k-means++ and discuss their impact on convergence and solution quality.

(b) Provide two methods to choose K (from silhouette, elbow, BIC). Explain how and why these methods can fail under non-spherical clusters or clusters with different densities/sizes.

(c) Given feature scaling issues and outliers, propose concrete preprocessing steps before running K-means.

(d) Describe how you would evaluate whether the clusters are useful for a marketing segmentation problem, including business-oriented post hoc validation beyond internal clustering metrics.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Other•More Data Scientist•Other Data Scientist•Other Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.