Solve SQL CTR and Python analytics tasks

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during HR Screen rounds at Upstart.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Upstart during technical interviews.

Question

Part A — SQL: Compute click-through rate (CTR) by pin_format for US new users. New users are those whose signup_date is within 30 days (inclusive) of the action event_date.
Schema (invented):
- events(event_date DATE, user_id INT, pin_id INT, event_type STRING CHECK IN ('impression','click'), event_count INT)
- users(user_id INT, country_code STRING, signup_date DATE)
- pin_classification(pin_id INT, pin_format STRING CHECK IN ('video','static','story','other'))
Notes/requirements:
- Join events→users on user_id and events→pin_classification on pin_id.
- Filter users.country_code = 'US'.
- Define new user as 0 ≤ DATEDIFF(event_date, signup_date) ≤ 30 (inclusive).
- Aggregate impressions = SUM(event_count WHERE event_type='impression'); clicks = SUM(event_count WHERE event_type='click').
- CTR = clicks / impressions, computed after aggregation at the pin_format level; exclude pin_formats whose aggregated impressions = 0 to avoid division by zero.
- Return one row per pin_format with columns: pin_format, impressions, clicks, ctr (rounded to 4 decimals), and the distinct_new_users contributing to those events.
- Ensure no double counting if multiple rows per (event_date, user_id, pin_id, event_type).
ASCII sample tables:
users
+---------+--------------+-------------+
| user_id | country_code | signup_date |
+---------+--------------+-------------+
| 101 | US | 2025-07-20 |
| 102 | US | 2025-08-05 |
| 103 | CA | 2025-08-01 |
| 104 | US | 2025-06-10 |
+---------+--------------+-------------+

pin_classification
+--------+------------+
| pin_id | pin_format |
+--------+------------+
| 10 | video |
| 11 | static |
| 12 | video |
| 13 | story |
+--------+------------+

events
+------------+---------+--------+------------+-------------+
| event_date | user_id | pin_id | event_type | event_count |
+------------+---------+--------+------------+-------------+
| 2025-08-01 | 101 | 10 | impression | 5 |
| 2025-08-01 | 101 | 10 | click | 1 |
| 2025-08-10 | 102 | 11 | impression | 3 |
| 2025-08-10 | 102 | 11 | click | 0 |
| 2025-08-15 | 103 | 12 | impression | 4 |
| 2025-08-15 | 103 | 12 | click | 2 |
| 2025-08-20 | 104 | 13 | impression | 7 |
| 2025-08-20 | 104 | 13 | click | 1 |
+------------+---------+--------+------------+-------------+
Question: Write a single SQL query that produces the required output, handling all constraints above.

Part B — Python (Pandas + core Python):
1) Given a DataFrame df with columns [pin_id:int, category_id:int, time_spent_seconds:float, pin_format:str] and a dict category_map = {category_id:int -> category_name:str}, return the category_name with the highest average time_spent_seconds among rows where pin_format case-insensitively equals 'video'. Break ties by lexicographically smallest category_name. Exclude null/NaN time_spent_seconds. Aim for vectorized Pandas (no Python loops over rows).
2) Given a dict user_to_pins = {'user1':[1,2,2], 'user2':[1,2,3], ...}, compute the average number of unique pin_ids per user (i.e., mean over users of |set(pin_ids)|), returning a float. Handle empty lists, extremely large lists, and memory efficiency for millions of users. Provide time/space complexity and code.

PracHub · Accepted Answer

This question evaluates SQL and Python data-manipulation competencies, covering date-based user filtering, joins and deduplication across events, users and pin metadata, aggregated CTR calculation with division-by-zero handling, and vectorized Pandas grouping, NaN handling, case-insensitive filtering, and tie-breaking.

Quick Overview

Quick Overview