Write SQL and pandas for shopping events

Q: Write SQL and pandas for shopping events

This is a Data Manipulation (SQL/Python) interview question from Pinterest for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Use the schema and sample data below to answer SQL and pandas tasks. Treat 'today' as 2025-09-01.

Schema users(user_id INT, country STRING) pins(pin_id INT, creator_user_id INT, is_shopping_enabled BOOLEAN, created_at DATE) events(event_id INT, user_id INT, pin_id INT, event_type STRING, event_ts TIMESTAMP, stay_time_sec INT, feature STRING, product_category_code STRING) product_categories(code STRING, name STRING)

Sample tables (small) users user_id | country 1 | US 2 | US 3 | CA

pins pin_id | creator_user_id | is_shopping_enabled | created_at 10 | 3 | true | 2025-08-20 11 | 2 | false | 2025-08-22 12 | 1 | true | 2025-08-25

events event_id | user_id | pin_id | event_type | event_ts | stay_time_sec | feature | product_category_code 101 | 1 | 10 | shopping_click | 2025-08-30 10:00:00 | 35 | shopping_module | A 102 | 1 | 10 | shopping_click | 2025-09-01 09:00:00 | 50 | shopping_module | null 103 | 2 | 12 | shopping_click | 2025-08-31 12:00:00 | null | shopping_module | H 104 | 2 | 11 | view | 2025-09-01 08:00:00 | null | feed | null 105 | 3 | 10 | shopping_click | 2025-08-28 07:00:00 | 15 | shopping_module | A 106 | 3 | 10 | shopping_click | 2025-09-01 11:59:00 | -5 | shopping_module | ?

product_categories code | name A | Apparel H | Home ? | Unknown

Tasks SQL

Daily shopping engagement last 7 days: Write a single SQL query that returns, for each date d in [2025-08-26, 2025-09-01], the columns (d, dau_shopping, clicks, avg_stay_time_pos_sec, rolling_7d_uniq_users). Count only events where event_type='shopping_click' and feature='shopping_module'. Deduplicate rapid repeat clicks per (user_id, pin_id) if they occur within 5 minutes (treat as one); implement dedup with window functions. Exclude stay_time_sec <= 0 or NULL from the average, but still count those clicks in 'clicks'. Ensure dates with no activity appear with zeros using a generated dates CTE.
Next-day retention: Among users with at least one shopping_click on 2025-08-31, compute the percentage that also have a shopping_click on 2025-09-01.
Top pins per user: For the 7 days ending 2025-09-01, return for each user their top 2 pins by number of deduplicated shopping_clicks; break ties by greater total positive stay_time_sec, then by smallest pin_id.

Python (pandas) Given a DataFrame events_df with the same columns as events: (a) Map product_category_code using dict = {'A':'Apparel','H':'Home','?':'Unknown'} so missing/unknown codes become 'Unknown'. (b) Replace negative stay_time_sec with NaN; fill remaining NaN stay_time_sec with 0 for aggregation but exclude zeros from averages where appropriate. (c) Sort events_df by ['user_id' asc, 'event_ts' desc, 'stay_time_sec' desc]. (d) Compute, for the last 7 days ending 2025-09-01, each user's top category by total positive stay_time_sec and return a Series user_id -> top_category (tie-breaker: alphabetical).

Write SQL and pandas for shopping events

Comments (0)