You have a directed edge list that records who followed whom. A mutual “friendship” exists only if both directions appear (A→B and B→A). Schema and sample data:
Schema:
FriendEdges(id INT PRIMARY KEY, user_from VARCHAR(10), user_to VARCHAR(10))
Sample rows:
id | user_from | user_to
1 | A | B
2 | B | A
3 | A | C
4 | C | A
5 | A | D
6 | D | A
7 | B | C
8 | C | B
Tasks:
-
Write a single SQL query that derives an undirected friendship table Friends(u1, u2) containing one row per friendship with u1 < u2, based on reciprocal edges in FriendEdges.
-
Using only FriendEdges (no temporary tables), write a single SQL query that returns, for every unordered pair of distinct users (x, y) with x < y, all of their common friends f (users who are friends with both x and y). Output columns: user1, user2, common_friend. Exclude the pair themselves from being counted as their own friend.
-
Extend (2) to also return, per pair (x, y), the count of distinct common friends. Ensure no duplicates even if multiple reciprocal edges are present.
-
Explain the indexes you would add on FriendEdges to make (2) performant on 100M rows, and why.