Compute reply-based user metrics in 7 days
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Technical Screen
You are analyzing discussions on a social platform.
## Tables
### `all_post`
- `post_id` (BIGINT, PK)
- `post_author_id` (BIGINT, FK → `user.user_id`)
- `post_creation_time` (TIMESTAMP, UTC)
- `post_type` (VARCHAR) — e.g., `'root'` for an original post and `'reply'` for a reply
- `post_content` (TEXT)
- `post_parent_id` (BIGINT, nullable) — if this is a reply, points to the parent post’s `post_id`; `NULL` for root posts
### `user`
- `user_id` (BIGINT, PK)
- `age` (INT)
- `country` (VARCHAR) — e.g., `'US'`
## Definitions / Assumptions
- A **reply** is a row in `all_post` with `post_parent_id IS NOT NULL`.
- A user **receives a reply** when someone creates a reply whose `post_parent_id` points to a post authored by that user.
- “Within 7 days” means `reply.post_creation_time` is in `[parent.post_creation_time, parent.post_creation_time + INTERVAL '7 days']` (inclusive). Use UTC.
## Tasks
1) **Count users who received at least two replies within 7 days**
- Count distinct `post_author_id` (the parent post’s author) who have **at least one parent post** that received **≥ 2 replies** within 7 days of that parent post’s creation.
- Output: a single row with `num_users`.
2) **Percentage of users who received replies from ≥ 2 distinct US users**
- For each recipient user (the parent post’s author), look at all replies to their posts (no time window unless you choose to state one).
- A recipient qualifies if they received replies from **at least 2 distinct repliers** where the replier’s `user.country = 'US'`.
- Denominator: all users who received **at least 1 reply** (from anyone).
- Output: a single row with `pct_users` (as a decimal or percent; specify which in your query).
Note: Task (1) and task (2) have different qualifying criteria; be careful not to mix the populations.
Quick Answer: This question evaluates competency in SQL-based data manipulation and temporal user-metric calculation, focusing on aggregations and distinct-user counts derived from reply relationships within defined time windows.