Count Same-Day Registered Posters
Company: Bytedance
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates SQL querying skills, including joining relational tables, date handling, aggregation, and deduplicating users across events. Commonly asked in Coding & Algorithms interviews for data scientists, it assesses practical relational database querying and analytics tasks—filtering, grouping, and counting distinct users across tables—and tests practical application of SQL rather than purely conceptual database theory; this summary is in English.
Constraints
- 0 <= len(users), len(posts) <= 200000
- Each `user_id` in `users` is unique
- Timestamps are valid strings in `YYYY-MM-DD HH:MM:SS` format
- A user may publish multiple posts on the same day, but counts only once for that date
- Posts whose `user_id` does not exist in `users` should be ignored
Examples
Input: ([(1, "2023-07-01 08:00:00"), (2, "2023-07-01 09:30:00"), (3, "2023-07-02 10:00:00")], [(10, 1, "2023-07-01 12:00:00"), (11, 1, "2023-07-01 13:00:00"), (12, 2, "2023-07-02 08:00:00"), (13, 3, "2023-07-02 11:00:00"), (14, 2, "2023-07-01 18:00:00")])
Expected Output: [("2023-07-01", 2), ("2023-07-02", 1)]
Explanation: Users 1 and 2 both registered and posted on 2023-07-01. User 1 posted twice but is counted once. User 3 qualifies on 2023-07-02.
Input: ([(1, "2023-01-01 00:00:00"), (2, "2023-01-01 05:00:00"), (3, "2023-01-03 09:00:00"), (4, "2023-01-03 11:00:00")], [(1, 1, "2023-01-01 10:00:00"), (2, 2, "2023-01-02 10:00:00"), (3, 3, "2023-01-03 09:30:00"), (4, 3, "2023-01-03 10:00:00"), (5, 5, "2023-01-03 12:00:00")])
Expected Output: [("2023-01-01", 1), ("2023-01-03", 1)]
Explanation: User 1 qualifies on 2023-01-01. User 3 qualifies on 2023-01-03 and is counted once despite two posts. User 2 posted on a different day, user 4 never posted, and user 5 does not exist in users.
Input: ([], [])
Expected Output: []
Explanation: With no users and no posts, there are no qualifying dates.
Input: ([(1, "2024-02-29 23:59:59")], [(1, 1, "2024-03-01 00:00:00")])
Expected Output: []
Explanation: The user posted, but not on the same calendar date as registration.
Input: ([(7, "2022-12-31 00:00:00")], [(100, 7, "2022-12-31 01:00:00"), (101, 7, "2022-12-31 23:00:00")])
Expected Output: [("2022-12-31", 1)]
Explanation: The same user made multiple posts on the registration day, but should only be counted once.
Hints
- Extract the date part from both timestamps before comparing them.
- To avoid counting multiple posts from the same user on the same day more than once, store matched users in a set before aggregating.