Compute CTR by format for new US users
Company: Pinterest
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given three tables. Write a single SQL query to compute click-through rate (CTR) by pin_format for NEW users in the US, where NEW users are those whose sign_up_date is within 30 days (inclusive) of the action's event_date.
Schemas:
- events(event_date DATE, user_id BIGINT, pin_id BIGINT, event_type STRING CHECK IN ('impression','click'), event_count INT)
- users(user_id BIGINT, country STRING, sign_up_date DATE)
- pin_classification(pin_id BIGINT, pin_format STRING)
Sample data:
Table: events
event_date | user_id | pin_id | event_type | event_count
2025-08-15 | 101 | 10 | impression | 100
2025-08-15 | 101 | 10 | click | 5
2025-08-20 | 102 | 11 | impression | 50
2025-08-22 | 103 | 12 | impression | 80
2025-08-22 | 103 | 12 | click | 8
2025-08-25 | 104 | 11 | impression | 200
2025-08-25 | 104 | 11 | click | 40
Table: users
user_id | country | sign_up_date
101 | US | 2025-08-01
102 | US | 2025-07-10
103 | CA | 2025-08-10
104 | US | 2025-08-21
Table: pin_classification
pin_id | pin_format
10 | video
11 | static
12 | video
Requirements:
- Consider only events where users.country = 'US'.
- Define NEW users as 0 <= DATEDIFF(event_date, sign_up_date) <= 30.
- CTR per pin_format = SUM(click event_count) / SUM(impression event_count) across all NEW US users' events.
- Return columns: pin_format, impressions, clicks, ctr where
impressions = SUM(CASE WHEN event_type = 'impression' THEN event_count ELSE 0 END),
clicks = SUM(CASE WHEN event_type = 'click' THEN event_count ELSE 0 END),
ctr = clicks / NULLIF(impressions, 0).
- Exclude pin_formats with zero impressions after filtering.
- Round ctr to 4 decimal places and order by ctr DESC; ties broken by pin_format ASC.
- Your query must be a single SELECT (CTEs allowed), handle duplicate rows safely, and not double-count events.
Quick Answer: This question evaluates a candidate's competency in SQL data manipulation and aggregation, covering joins, date-based filtering for user cohorts, deduplication, conditional summing, null-safe arithmetic, rounding, and ordering to compute a business metric (CTR).