Write SQL to rank categories by impressions
Company: Pinterest
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You are given the following schema and sample data. Today is 2025-09-01.
Schema:
- users(user_id INT, country_id VARCHAR)
- countries(country_id VARCHAR, country_name VARCHAR)
- daily_impressions(dt DATE, user_id INT, category VARCHAR, impressions INT)
- app_events(dt DATE, user_id INT) -- each row = user active on that day
- feature_usage(dt DATE, user_id INT, feature VARCHAR) -- one row per feature used per day
Sample tables (small, non-exhaustive):
users
user_id | country_id
1 | US
2 | US
3 | CN
4 | IN
countries
country_id | country_name
US | United States
CN | China
IN | India
daily_impressions
dt | user_id | category | impressions
2025-08-29 | 1 | Pets | 100
2025-08-30 | 1 | Pets | 50
2025-08-31 | 2 | Fashion | 70
2025-08-31 | 3 | Pets | 80
2025-08-31 | 4 | Electronics | 90
2025-08-15 | 1 | Fashion | 10
app_events
dt | user_id
2025-08-26 | 1
2025-08-27 | 1
2025-08-28 | 1
2025-08-29 | 1
2025-08-30 | 2
2025-08-31 | 2
2025-08-27 | 3
2025-08-30 | 3
2025-08-31 | 3
2025-09-01 | 3
2025-08-28 | 4
feature_usage
dt | user_id | feature
2025-08-29 | 1 | Search
2025-08-29 | 1 | Share
2025-08-29 | 1 | Save
2025-08-27 | 3 | Search
2025-08-27 | 3 | Share
2025-08-27 | 3 | Save
2025-08-31 | 2 | Search
2025-08-31 | 2 | Share
2025-09-01 | 3 | Comment
2025-09-01 | 3 | Save
2025-09-01 | 3 | Share
Tasks:
A) For August 2025 (2025-08-01 to 2025-08-31), return the category with the highest total impressions per country. Output: country_name, top_category, top_impressions, and the share of that category among the country’s August impressions (as a decimal rounded to 4 dp). Break ties by the lexicographically smallest category. Use only the provided tables; joining three tables (daily_impressions → users → countries) is expected.
B) For the last 7 days ending today (2025-08-26 to 2025-09-01 inclusive), compute the heavy_user_rate per country. Heavy user definition: active on at least 4 distinct days in the window (from app_events) AND on at least one of those days used 3+ distinct features (from feature_usage). Define active_users as users with ≥1 app_events day in the window. Output: country_name, heavy_user_count, active_user_count, heavy_user_rate = heavy_user_count/active_user_count (round 4 dp). Consider users with missing country_id by excluding them from both numerator and denominator.
Requirements:
- Write standard SQL (CTEs allowed). Avoid windowing over excessively large intermediate sets. Ensure correct handling of users with no impressions (for A) and no feature_usage (for B). Explain how you would index these tables to make A and B performant on 10^9 daily_impressions rows.
Quick Answer: This question evaluates proficiency in SQL data manipulation—covering joins, aggregations, grouping, tie-breaking, date-window logic, and scalability considerations such as indexing and query performance—in the Data Manipulation (SQL/Python) domain.