Identify Duplicate Posts by User and Date
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
posts
+---------+---------+---------------------+---------------+
| post_id | user_id | created_at | content |
+---------+---------+---------------------+---------------+
| 101 | 1 | 2024-05-01 10:00:00 | hello world |
| 102 | 1 | 2024-05-01 10:00:00 | hello world |
| 103 | 2 | 2024-05-02 12:00:00 | good morning |
| 104 | 1 | 2024-05-03 09:00:00 | hello world |
| 105 | 2 | 2024-05-02 12:05:00 | good morning |
+---------+---------+---------------------+---------------+
##### Scenario
Meta wants to detect duplicate posts so they can surface only unique content in users’ feeds.
##### Question
Given table posts(post_id, user_id, created_at, content), write SQL that returns user_id, content, duplicate_count for posts where the same user posted identical content on the same calendar day (duplicate_count ≥
2).
##### Hints
Group by user_id, content, DATE(created_at); HAVING COUNT(*) > 1.
Quick Answer: This question evaluates a candidate's competency in data manipulation and deduplication, focusing on aggregation and handling of temporal attributes to identify repeated entries by the same user.