Write SQL for hashtag source and safety rates
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Write SQL for the two tasks below. Assume the schema and sample data as given, and that “today” is 2025‑09‑01. Deduplicate exact duplicates by (date, user_id, hashtag_id, source). If a hashtag_id is missing from the hashtag table, exclude it only from task (2); include it in task (1). Use UTC dates. Schema:
Table: following_behavior(date DATE, user_id INT, hashtag_id INT, source VARCHAR) -- each row is a follow event
Table: hashtag(hashtag_id INT, safety VARCHAR) -- safety in {'safety','violating'}
Sample rows — following_behavior:
2025-09-01 | 1 | 100 | hashtag page
2025-09-01 | 1 | 100 | hashtag page -- duplicate
2025-09-01 | 1 | 101 | feed
2025-09-01 | 2 | 100 | feed
2025-09-01 | 2 | 102 | hashtag page
2025-09-01 | 3 | 103 | hashtag page
2025-08-31 | 4 | 100 | hashtag page -- not today
2025-09-01 | 5 | 104 | feed
2025-09-01 | 6 | 105 | hashtag page
2025-09-01 | 6 | 105 | feed
2025-09-01 | 7 | 106 | hashtag page
2025-09-01 | 8 | 107 | feed
2025-09-01 | 9 | 108 | hashtag page
2025-09-01 | 10 | 109 | feed
2025-09-01 | 11 | 110 | hashtag page
2025-09-01 | 12 | 999 | hashtag page -- hashtag 999 missing from hashtag table
Sample rows — hashtag:
100 | safety
101 | violating
102 | safety
103 | violating
104 | safety
105 | safety
106 | violating
107 | safety
108 | violating
109 | safety
110 | safety
Tasks:
(1) Which source ('hashtag page' vs 'feed') has the most follows today? Return: source, follows_today, and rank (1=most). Break ties by alphabetical source.
(2) What percent of today’s follows from source='hashtag page' are on violating hashtags? Return a single row with pct_violating (0–100 with two decimals).
Quick Answer: This question evaluates SQL data-manipulation competencies such as deduplication, date filtering, aggregations and ranking, handling missing foreign-key references, and calculating percentages from event and reference tables.