You are given user and post data for an online forum platform.
Tables (schemas)
Assume the following tables:
users
-
user_id
BIGINT (PK)
-
home_country_code
VARCHAR (nullable; e.g., 'US', 'CA')
-
created_at
TIMESTAMP
posts
-
post_id
BIGINT (PK)
-
user_id
BIGINT (FK →
users.user_id
)
-
forum_id
BIGINT (FK →
forums.forum_id
)
-
created_at
TIMESTAMP
-
ip_country_code
VARCHAR (nullable; country inferred from the posting IP)
forums
-
forum_id
BIGINT (PK)
-
forum_name
VARCHAR
Time assumptions
-
Use the last
30 days
of
posts.created_at
relative to
CURRENT_DATE
.
-
Treat timestamps as UTC.
Tasks
Write SQL queries for the following:
-
Identify which users are “in the US.”
-
Define a user as “in the US” if
either
:
-
users.home_country_code = 'US'
,
or
-
among their posts in the last 30 days,
at least 50%
have
ip_country_code = 'US'
.
-
Output columns:
-
user_id
-
us_flag
(1 if in US else 0)
-
us_post_share_30d
(US posts / total posts in last 30 days; null if no posts)
-
Find the most active forums in the last 30 days.
-
“Most active” should be measured by:
-
primary metric:
post_count_30d
-
tie-breaker:
active_user_count_30d
(distinct posters)
-
Output the
top 10
forums with columns:
-
forum_id
-
forum_name
-
post_count_30d
-
active_user_count_30d
Notes:
-
Handle null country codes safely.
-
You may use CTEs and window functions.