PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Reddit

Query US-based users and most active forums

Last updated: Mar 29, 2026

Quick Overview

This question evaluates SQL data manipulation skills, including time-based filtering, joins, aggregations, null-safe handling, ratio computation, distinct counts, and top-N ranking to determine user residency and forum activity.

  • Medium
  • Reddit
  • Data Manipulation (SQL/Python)
  • Data Scientist

Query US-based users and most active forums

Company: Reddit

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You are given user and post data for an online forum platform. ## Tables (schemas) Assume the following tables: ### `users` - `user_id` BIGINT (PK) - `home_country_code` VARCHAR (nullable; e.g., 'US', 'CA') - `created_at` TIMESTAMP ### `posts` - `post_id` BIGINT (PK) - `user_id` BIGINT (FK → `users.user_id`) - `forum_id` BIGINT (FK → `forums.forum_id`) - `created_at` TIMESTAMP - `ip_country_code` VARCHAR (nullable; country inferred from the posting IP) ### `forums` - `forum_id` BIGINT (PK) - `forum_name` VARCHAR ## Time assumptions - Use the last **30 days** of `posts.created_at` relative to `CURRENT_DATE`. - Treat timestamps as UTC. ## Tasks Write SQL queries for the following: 1) **Identify which users are “in the US.”** - Define a user as “in the US” if **either**: - `users.home_country_code = 'US'`, **or** - among their posts in the last 30 days, **at least 50%** have `ip_country_code = 'US'`. - Output columns: - `user_id` - `us_flag` (1 if in US else 0) - `us_post_share_30d` (US posts / total posts in last 30 days; null if no posts) 2) **Find the most active forums in the last 30 days.** - “Most active” should be measured by: - primary metric: `post_count_30d` - tie-breaker: `active_user_count_30d` (distinct posters) - Output the **top 10** forums with columns: - `forum_id` - `forum_name` - `post_count_30d` - `active_user_count_30d` Notes: - Handle null country codes safely. - You may use CTEs and window functions.

Quick Answer: This question evaluates SQL data manipulation skills, including time-based filtering, joins, aggregations, null-safe handling, ratio computation, distinct counts, and top-N ranking to determine user residency and forum activity.

Related Interview Questions

  • Load and prepare JSON for modeling - Reddit (Medium)
Reddit logo
Reddit
Sep 19, 2025, 12:00 AM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
3
0

You are given user and post data for an online forum platform.

Tables (schemas)

Assume the following tables:

users

  • user_id BIGINT (PK)
  • home_country_code VARCHAR (nullable; e.g., 'US', 'CA')
  • created_at TIMESTAMP

posts

  • post_id BIGINT (PK)
  • user_id BIGINT (FK → users.user_id )
  • forum_id BIGINT (FK → forums.forum_id )
  • created_at TIMESTAMP
  • ip_country_code VARCHAR (nullable; country inferred from the posting IP)

forums

  • forum_id BIGINT (PK)
  • forum_name VARCHAR

Time assumptions

  • Use the last 30 days of posts.created_at relative to CURRENT_DATE .
  • Treat timestamps as UTC.

Tasks

Write SQL queries for the following:

  1. Identify which users are “in the US.”
    • Define a user as “in the US” if either :
      • users.home_country_code = 'US' , or
      • among their posts in the last 30 days, at least 50% have ip_country_code = 'US' .
    • Output columns:
      • user_id
      • us_flag (1 if in US else 0)
      • us_post_share_30d (US posts / total posts in last 30 days; null if no posts)
  2. Find the most active forums in the last 30 days.
    • “Most active” should be measured by:
      • primary metric: post_count_30d
      • tie-breaker: active_user_count_30d (distinct posters)
    • Output the top 10 forums with columns:
      • forum_id
      • forum_name
      • post_count_30d
      • active_user_count_30d

Notes:

  • Handle null country codes safely.
  • You may use CTEs and window functions.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Reddit•More Data Scientist•Reddit Data Scientist•Reddit Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.