Write SQL for social feed metrics and ties
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given the following schema (PostgreSQL) and sample rows. Assume UTC timestamps and that friendships are static over the sample window.
users(user_id INT, join_date DATE)
friendships(user_id INT, friend_id INT) -- undirected; both directions present
posts(post_id INT, author_id INT, created_at TIMESTAMP)
feed_impressions(imp_id INT, user_id INT, post_id INT, impression_ts TIMESTAMP)
interactions(int_id INT, user_id INT, post_id INT, type VARCHAR CHECK (type IN ('like','comment','share')), interaction_ts TIMESTAMP)
Sample data:
users
+---------+------------+
| user_id | join_date |
+---------+------------+
| 1 | 2025-06-01 |
| 2 | 2025-06-15 |
| 3 | 2025-07-01 |
+---------+------------+
friendships
+---------+-----------+
| user_id | friend_id |
+---------+-----------+
| 1 | 2 |
| 2 | 1 |
+---------+-----------+
posts
+---------+-----------+---------------------+
| post_id | author_id | created_at |
+---------+-----------+---------------------+
| 10 | 2 | 2025-08-01 10:00:00 |
| 11 | 3 | 2025-08-01 11:00:00 |
| 12 | 2 | 2025-08-02 09:00:00 |
| 13 | 3 | 2025-07-15 08:00:00 |
+---------+-----------+---------------------+
feed_impressions
+--------+---------+---------+---------------------+
| imp_id | user_id | post_id | impression_ts |
+--------+---------+---------+---------------------+
| 100 | 1 | 10 | 2025-08-01 10:05:00 |
| 101 | 1 | 11 | 2025-08-01 11:05:00 |
| 102 | 1 | 12 | 2025-08-02 09:10:00 |
| 103 | 2 | 11 | 2025-08-01 12:00:00 |
| 104 | 1 | 13 | 2025-07-15 08:05:00 |
+--------+---------+---------+---------------------+
interactions
+--------+---------+---------+-------------+---------------------+
| int_id | user_id | post_id | type | interaction_ts |
+--------+---------+---------+-------------+---------------------+
| 200 | 1 | 10 | like | 2025-08-01 10:06:00 |
| 201 | 1 | 11 | comment | 2025-08-01 11:06:00 |
| 202 | 1 | 12 | share | 2025-08-02 09:12:00 |
| 203 | 2 | 11 | like | 2025-08-01 12:05:00 |
| 204 | 1 | 13 | like | 2025-07-15 08:06:00 |
+--------+---------+---------+-------------+---------------------+
Tasks (be explicit about the aggregation level; do not accidentally aggregate at the post level when the unit is viewer-day):
1) Classify each impression as 'friend' vs 'unconnected' by joining feed_impressions -> posts.author_id -> friendships relative to the viewer (user_id). Write SQL to compute, for each (user_id, impression_date), the fraction of impressions that are from friends vs unconnected authors.
2) Define a weighted social engagement score per viewer-day and content_source ('friend'/'unconnected') where weights are like=1, comment=3, share=5. Compute the score using interactions by the same viewer on the corresponding posts on that date; impressions with no interactions contribute 0. Return one row per (user_id, dt, content_source) with impressions, interactions_by_type, and weighted_score.
3) For date = '2025-08-01', return the top 2 posts per user by that user's weighted engagement score (same weights as above), breaking ties with dense_rank() so that all tied posts at the cutoff are included; if still tied, order by post_id ASC. Show the SQL.
4) Compute month-over-month percent change in weighted engagement per content_source between 2025-07 and 2025-08, aggregating across all users. Handle missing months by generating a month calendar CTE (2025-07 to 2025-08) and treating absent months as zero before applying LAG. Return: month, content_source, score, mom_pct_change. Explain any assumptions about multiple interactions on the same post and time-zone boundaries.
Quick Answer: This question evaluates advanced SQL analytics skills and competency in data modeling for social feed metrics, including relationship joins, viewer-day temporal aggregation, weighted interaction scoring, and per-user top-N ranking.