Which clustering algorithm would you use and why
Company: Meta
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You need to cluster users for a social product.
Part A (traditional clustering):
- You are given a user feature table (dense numeric/categorical features such as age bucket, country, activity rate, topics engaged, embeddings). Which clustering algorithms would you consider (e.g., k-means, GMM, hierarchical, DBSCAN), and how would you choose among them?
- Describe preprocessing, distance/similarity choices, how you would pick the number of clusters, and how you would evaluate cluster quality.
Part B (social network / graph clustering):
- Now assume the core data is a social graph (nodes = users, edges = follows/friends/interactions, possibly weighted and directed).
- What algorithms would you use for clustering/communities on a graph, and how does this differ from “traditional” clustering on a feature matrix?
- Discuss scalability, handling directed/weighted graphs, and how you would evaluate the resulting communities for product use.
Quick Answer: This Machine Learning interview prompt for a Data Scientist evaluates understanding of clustering algorithms and graph community detection, including preprocessing, distance/similarity choices, model selection (number of clusters), cluster/community quality metrics, and scalability, at an applied algorithm-selection and evaluation level.