Write SQL for comment analytics

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Onsite rounds at Meta.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Question

You are given the following schema and tiny sample data.

Schema:
- users(user_id INT PRIMARY KEY, country VARCHAR, created_at DATE)
- posts(post_id INT PRIMARY KEY, author_id INT, created_at DATE)
- comments(comment_id INT PRIMARY KEY, post_id INT, user_id INT, created_at DATE, content VARCHAR, is_deleted BOOLEAN)

Sample tables (minimal, not exhaustive):
users
user_id | country | created_at
1 | US | 2025-08-01
2 | US | 2025-08-05
3 | CA | 2025-08-10
4 | US | 2025-08-20

posts
post_id | author_id | created_at
10 | 1 | 2025-08-25
11 | 2 | 2025-08-28
12 | 3 | 2025-08-30

comments
comment_id | post_id | user_id | created_at | content | is_deleted
100 | 10 | 2 | 2025-08-26 | "Nice!" | false
101 | 10 | 3 | 2025-08-27 | "Great post" | false
102 | 11 | 1 | 2025-08-28 | "Thanks" | false
103 | 11 | 2 | 2025-08-30 | "I agree" | true
104 | 12 | 4 | 2025-08-31 | "Wow" | false
105 | 12 | 1 | 2025-09-01 | "Following" | false
106 | 12 | 3 | 2025-09-01 | "Awesome insights"| false
107 | 10 | 1 | 2025-09-01 | "Self" | false

Assume "today" = 2025-09-01. For the 7-day window 2025-08-26 through 2025-09-01 (inclusive), write ANSI SQL (you may assume percentile_cont is available) to:
(a) Return the top 3 posts by distinct commenter count, excluding deleted comments. Output: post_id, unique_commenters, distinct_commenter_countries. Break ties by newer posts (posts.created_at desc) then lower post_id.
(b) For each post and day in the window, compute P50 and P90 of comment text length (use length(content)) over non-deleted comments. Output: day, post_id, p50_len, p90_len. Include days with at least 1 non-deleted comment.
(c) List users who commented on their own posts in the window, with columns: user_id, self_comment_count, last_self_comment_at. Order by self_comment_count desc, user_id asc.
Aim for readable, efficient queries using CTEs; avoid unnecessary scans.

PracHub · Accepted Answer

This question evaluates proficiency in SQL analytics, specifically joins, filtering, aggregation (distinct counts), percentile functions, time-windowed grouping, handling soft-deleted records, and writing efficient queries via CTEs.

Quick Overview

Quick Overview