Write SQL for comment analytics
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You are given the following schema and tiny sample data.
Schema:
- users(user_id INT PRIMARY KEY, country VARCHAR, created_at DATE)
- posts(post_id INT PRIMARY KEY, author_id INT, created_at DATE)
- comments(comment_id INT PRIMARY KEY, post_id INT, user_id INT, created_at DATE, content VARCHAR, is_deleted BOOLEAN)
Sample tables (minimal, not exhaustive):
users
user_id | country | created_at
1 | US | 2025-08-01
2 | US | 2025-08-05
3 | CA | 2025-08-10
4 | US | 2025-08-20
posts
post_id | author_id | created_at
10 | 1 | 2025-08-25
11 | 2 | 2025-08-28
12 | 3 | 2025-08-30
comments
comment_id | post_id | user_id | created_at | content | is_deleted
100 | 10 | 2 | 2025-08-26 | "Nice!" | false
101 | 10 | 3 | 2025-08-27 | "Great post" | false
102 | 11 | 1 | 2025-08-28 | "Thanks" | false
103 | 11 | 2 | 2025-08-30 | "I agree" | true
104 | 12 | 4 | 2025-08-31 | "Wow" | false
105 | 12 | 1 | 2025-09-01 | "Following" | false
106 | 12 | 3 | 2025-09-01 | "Awesome insights"| false
107 | 10 | 1 | 2025-09-01 | "Self" | false
Assume "today" = 2025-09-01. For the 7-day window 2025-08-26 through 2025-09-01 (inclusive), write ANSI SQL (you may assume percentile_cont is available) to:
(a) Return the top 3 posts by distinct commenter count, excluding deleted comments. Output: post_id, unique_commenters, distinct_commenter_countries. Break ties by newer posts (posts.created_at desc) then lower post_id.
(b) For each post and day in the window, compute P50 and P90 of comment text length (use length(content)) over non-deleted comments. Output: day, post_id, p50_len, p90_len. Include days with at least 1 non-deleted comment.
(c) List users who commented on their own posts in the window, with columns: user_id, self_comment_count, last_self_comment_at. Order by self_comment_count desc, user_id asc.
Aim for readable, efficient queries using CTEs; avoid unnecessary scans.
Quick Answer: This question evaluates proficiency in SQL analytics, specifically joins, filtering, aggregation (distinct counts), percentile functions, time-windowed grouping, handling soft-deleted records, and writing efficient queries via CTEs.