Write SQL for hashtag analytics and joins
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Assume today = 2025-09-01. Schema and small sample data are below. Use ANSI SQL; explain any dialect-specific functions you choose. Where asked, explain precisely why LEFT JOIN is required and what would break with INNER JOIN.
Tables:
- Users(user_id INT, join_date DATE)
- Posts(post_id INT, author_id INT, created_at TIMESTAMP)
- PostHashtags(post_id INT, hashtag VARCHAR)
- Impressions(user_id INT, post_id INT, seen_at TIMESTAMP) -- each row = a user saw a post (and any hashtags on it) once
- HashtagClicks(user_id INT, post_id INT, hashtag VARCHAR, clicked_at TIMESTAMP)
Sample rows (minimal but illustrative):
Users
user_id | join_date
1 | 2025-06-01
2 | 2025-07-15
3 | 2025-08-20
Posts
post_id | author_id | created_at
10 | 1 | 2025-08-28 10:00
11 | 2 | 2025-08-30 09:30
12 | 1 | 2025-08-25 12:00
PostHashtags
post_id | hashtag
10 | travel
10 | food
11 | travel
-- post 12 intentionally has no hashtags
Impressions
user_id | post_id | seen_at
1 | 11 | 2025-08-31 18:00
2 | 10 | 2025-08-29 08:00
2 | 11 | 2025-08-31 09:00
3 | 10 | 2025-08-29 09:00
3 | 12 | 2025-08-31 20:00
HashtagClicks
user_id | post_id | hashtag | clicked_at
3 | 10 | travel | 2025-08-29 09:05
2 | 11 | travel | 2025-08-31 09:02
Tasks:
1) Top hashtags by unique viewers in the last 7 days (2025-08-26 to 2025-09-01 inclusive): Return hashtag, unique_viewers, and rank; break ties deterministically. Hint: viewers of a hashtag are users who saw a post containing that hashtag. Use date functions rather than hardcoding the window.
2) Hashtag CTR in the last 7 days: For each hashtag, compute CTR = distinct users who clicked that hashtag on any post ÷ distinct users who were exposed to that hashtag on any post. Ensure the denominator only counts exposures where that hashtag was present. Handle users who saw multiple posts with the same hashtag without double‑counting.
3) Post‑level recommendations: For each post created in the last 7 days, suggest up to 2 hashtags that its viewers have clicked most in the last 30 days (2025-08-03 to 2025-09-01), excluding hashtags already on the post. Write a query that:
- Finds viewers of the post (from Impressions),
- Aggregates their hashtag clicks across any posts in the 30‑day window,
- Excludes hashtags already present on the target post via LEFT JOIN and IS NULL,
- Returns at most 2 suggestions per post, with stable tie‑breaking.
Then: Explain exactly what rows would be lost if you replaced the LEFT JOIN exclusion with an INNER JOIN, especially for posts that currently have no hashtags or when viewers never clicked anything.
4) Quality checks: Write a query to list posts with zero hashtags that still received more than 1 impression in the last 7 days. Explain why RIGHT JOIN is unnecessary here and how a LEFT JOIN against PostHashtags enables detection of missing‑child rows.
Quick Answer: This question evaluates SQL data manipulation skills including join semantics (LEFT vs INNER), deduplicated aggregations, time-window filtering, ranking/tie-breaking, and recommendation logic for computing unique viewers, CTRs, and post-level hashtag suggestions.