Write SQL and pandas for shopping events

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a Medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Pinterest.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Pinterest during technical interviews.

Question

Use the schema and sample data below to answer SQL and pandas tasks. Treat 'today' as 2025-09-01.

Schema
users(user_id INT, country STRING)
pins(pin_id INT, creator_user_id INT, is_shopping_enabled BOOLEAN, created_at DATE)
events(event_id INT, user_id INT, pin_id INT, event_type STRING, event_ts TIMESTAMP, stay_time_sec INT, feature STRING, product_category_code STRING)
product_categories(code STRING, name STRING)

Sample tables (small)
users
user_id | country
1 | US
2 | US
3 | CA

pins
pin_id | creator_user_id | is_shopping_enabled | created_at
10 | 3 | true | 2025-08-20
11 | 2 | false | 2025-08-22
12 | 1 | true | 2025-08-25

events
event_id | user_id | pin_id | event_type | event_ts | stay_time_sec | feature | product_category_code
101 | 1 | 10 | shopping_click | 2025-08-30 10:00:00 | 35 | shopping_module | A
102 | 1 | 10 | shopping_click | 2025-09-01 09:00:00 | 50 | shopping_module | null
103 | 2 | 12 | shopping_click | 2025-08-31 12:00:00 | null | shopping_module | H
104 | 2 | 11 | view | 2025-09-01 08:00:00 | null | feed | null
105 | 3 | 10 | shopping_click | 2025-08-28 07:00:00 | 15 | shopping_module | A
106 | 3 | 10 | shopping_click | 2025-09-01 11:59:00 | -5 | shopping_module | ?

product_categories
code | name
A | Apparel
H | Home
? | Unknown

Tasks
SQL
1) Daily shopping engagement last 7 days: Write a single SQL query that returns, for each date d in [2025-08-26, 2025-09-01], the columns (d, dau_shopping, clicks, avg_stay_time_pos_sec, rolling_7d_uniq_users). Count only events where event_type='shopping_click' and feature='shopping_module'. Deduplicate rapid repeat clicks per (user_id, pin_id) if they occur within 5 minutes (treat as one); implement dedup with window functions. Exclude stay_time_sec <= 0 or NULL from the average, but still count those clicks in 'clicks'. Ensure dates with no activity appear with zeros using a generated dates CTE.
2) Next-day retention: Among users with at least one shopping_click on 2025-08-31, compute the percentage that also have a shopping_click on 2025-09-01.
3) Top pins per user: For the 7 days ending 2025-09-01, return for each user their top 2 pins by number of deduplicated shopping_clicks; break ties by greater total positive stay_time_sec, then by smallest pin_id.

Python (pandas)
Given a DataFrame events_df with the same columns as events: (a) Map product_category_code using dict = {'A':'Apparel','H':'Home','?':'Unknown'} so missing/unknown codes become 'Unknown'. (b) Replace negative stay_time_sec with NaN; fill remaining NaN stay_time_sec with 0 for aggregation but exclude zeros from averages where appropriate. (c) Sort events_df by ['user_id' asc, 'event_ts' desc, 'stay_time_sec' desc]. (d) Compute, for the last 7 days ending 2025-09-01, each user's top category by total positive stay_time_sec and return a Series user_id -> top_category (tie-breaker: alphabetical).

PracHub · Accepted Answer

This question evaluates a candidate's ability to perform data manipulation in SQL and pandas, covering event deduplication, time-window aggregations, next-day retention calculations, top-N ranking, joins, and data cleaning/mapping for analytics.

Quick Overview

Quick Overview