How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Meta.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Which clustering algorithm would you use and why

Quick Overview

A Meta Data Scientist machine learning screen on choosing a clustering algorithm for social-product users. It contrasts traditional feature-vector clustering (k-means, GMM, hierarchical, DBSCAN/HDBSCAN) with social-graph community detection (Louvain/Leiden, spectral, SBM, node embeddings), and covers preprocessing, choosing the number of clusters, evaluation, directed/weighted graphs, and scaling to millions of users.

Question

You need to cluster users for a social product (e.g. Meta) to discover meaningful groups such as communities, interest groups, or usage segments. The data you have may be either, or both, of:

A user feature table — dense numeric/categorical features per user (age bucket, country, activity rate, topics engaged, embeddings, etc.).
A social network graph — nodes = users, edges = friendships / follows / messages / interactions, possibly weighted and directed .

Answer the following:

Traditional (feature-vector) clustering. Which clustering algorithms would you consider (e.g. k-means, GMM, hierarchical, DBSCAN/HDBSCAN) and how would you choose among them? Describe preprocessing, distance/similarity choices, how you would pick the number of clusters, and how you would evaluate cluster quality.
Social network / graph clustering. If the core data is a social graph instead, what algorithms would you use for community detection, and how does this differ fundamentally from clustering a feature matrix?
Directed and weighted graphs. How do you handle direction and edge weights in graph clustering?
Hybrid. How would you combine graph structure and user features when both are available?
Choosing the number of clusters and evaluating quality. What metrics and validation strategy would you use for both the feature-vector and the graph case?
Scale and operations. What practical issues arise at millions of users (compute, dynamic graphs, cold-start, drift) and how would you handle them?

Quick Overview

Question

You need to cluster users for a social product (e.g. Meta) to discover meaningful groups such as communities, interest groups, or usage segments. The data you have may be either, or both, of:

A user feature table — dense numeric/categorical features per user (age bucket, country, activity rate, topics engaged, embeddings, etc.).
A social network graph — nodes = users, edges = friendships / follows / messages / interactions, possibly weighted and directed .

Answer the following:

Traditional (feature-vector) clustering. Which clustering algorithms would you consider (e.g. k-means, GMM, hierarchical, DBSCAN/HDBSCAN) and how would you choose among them? Describe preprocessing, distance/similarity choices, how you would pick the number of clusters, and how you would evaluate cluster quality.
Social network / graph clustering. If the core data is a social graph instead, what algorithms would you use for community detection, and how does this differ fundamentally from clustering a feature matrix?
Directed and weighted graphs. How do you handle direction and edge weights in graph clustering?
Hybrid. How would you combine graph structure and user features when both are available?
Choosing the number of clusters and evaluating quality. What metrics and validation strategy would you use for both the feature-vector and the graph case?
Scale and operations. What practical issues arise at millions of users (compute, dynamic graphs, cold-start, drift) and how would you handle them?

Which clustering algorithm would you use and why

Quick Overview

Question

Solution

Submit Your Answer to Earn 20XP

Which clustering algorithm would you use and why

Quick Overview

Question

Solution

Submit Your Answer to Earn 20XP