Write SQL for reply-based recipient metrics
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: hard
Interview Round: Technical Screen
You work on a social product and are given two tables.
Assumptions (use these unless you state otherwise):
- All timestamps are in UTC.
- A “reply” is a row in `all_post` with `post_type = 'reply'` and a non-NULL `post_parent_id` pointing to the parent post.
- A user “receives a reply” when someone replies to one of the user’s posts; i.e., the recipient is the parent post’s `post_author_id`.
### Table schemas
`all_post`
- `post_id` BIGINT PRIMARY KEY
- `post_author_id` BIGINT NOT NULL -- FK to `user.user_id`
- `post_creation_time` TIMESTAMP NOT NULL
- `post_type` VARCHAR NOT NULL -- e.g., 'post', 'reply'
- `post_content` TEXT
- `post_parent_id` BIGINT NULL -- FK to `all_post.post_id` (parent post)
`user`
- `user_id` BIGINT PRIMARY KEY
- `age` INT
- `country` VARCHAR
### Tasks
1) **Users receiving 2 replies within 7 days:** Write a SQL query that returns the number of distinct users who have **at least one** of their posts receive **at least 2 replies** where the replies were created within **7 days after the parent post’s creation time**.
- Output: `user_cnt`
2) **% receiving replies from 2 distinct US repliers:** Among users who have received **at least 1 reply** to any of their posts, compute the percentage of those users who have received replies from **at least 2 distinct reply authors** whose `user.country = 'US'`.
- Output: `pct_users` (0–100 as a percentage, not a fraction)
Quick Answer: This Data Manipulation (SQL/Python) question for a Data Scientist evaluates SQL skills including joins, aggregation, grouping, time-based filtering, deduplication, conditional counting, and percentage calculations across related tables (posts and users).