Find common friends from directed edges
Company: Microsoft
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You have a directed edge list that records who followed whom. A mutual “friendship” exists only if both directions appear (A→B and B→A). Schema and sample data:
Schema:
FriendEdges(id INT PRIMARY KEY, user_from VARCHAR(10), user_to VARCHAR(10))
Sample rows:
id | user_from | user_to
1 | A | B
2 | B | A
3 | A | C
4 | C | A
5 | A | D
6 | D | A
7 | B | C
8 | C | B
Tasks:
1) Write a single SQL query that derives an undirected friendship table Friends(u1, u2) containing one row per friendship with u1 < u2, based on reciprocal edges in FriendEdges.
2) Using only FriendEdges (no temporary tables), write a single SQL query that returns, for every unordered pair of distinct users (x, y) with x < y, all of their common friends f (users who are friends with both x and y). Output columns: user1, user2, common_friend. Exclude the pair themselves from being counted as their own friend.
3) Extend (2) to also return, per pair (x, y), the count of distinct common friends. Ensure no duplicates even if multiple reciprocal edges are present.
4) Explain the indexes you would add on FriendEdges to make (2) performant on 100M rows, and why.
Quick Answer: This question evaluates a candidate's ability to manipulate relational data and reason about graph reciprocity, specifically testing SQL query formulation, aggregation, deduplication, and indexing for performance in the Data Manipulation (SQL/Python) domain.