This Data Manipulation (SQL/Python) question for a Data Scientist evaluates SQL skills including joins, aggregation, grouping, time-based filtering, deduplication, conditional counting, and percentage calculations across related tables (posts and users).
You work on a social product and are given two tables.
Assumptions (use these unless you state otherwise):
all_post
with
post_type = 'reply'
and a non-NULL
post_parent_id
pointing to the parent post.
post_author_id
.
all_post
post_id
BIGINT PRIMARY KEY
post_author_id
BIGINT NOT NULL -- FK to
user.user_id
post_creation_time
TIMESTAMP NOT NULL
post_type
VARCHAR NOT NULL -- e.g., 'post', 'reply'
post_content
TEXT
post_parent_id
BIGINT NULL -- FK to
all_post.post_id
(parent post)
user
user_id
BIGINT PRIMARY KEY
age
INT
country
VARCHAR
user_cnt
user.country = 'US'
.
pct_users
(0–100 as a percentage, not a fraction)