Query seven-day conversion with windows and dedupe
Company: Snowflake
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Assume all timestamps are UTC. Treat "today" as 2025-09-01. Define "last 7 days" as the inclusive window [2025-08-26 00:00:00, 2025-09-01 23:59:59]. Schema (invented):
users(user_id INT, signup_at TIMESTAMP, tz STRING)
orders(order_id INT, user_id INT, order_ts TIMESTAMP, amount DECIMAL(10,2), status STRING) -- status in ('completed','canceled')
events(event_id INT, user_id INT, event_ts TIMESTAMP, event_type STRING, product_id INT, device_id STRING)
Sample rows:
users
user_id | signup_at | tz
1 | 2025-08-10 09:00:00 | America/Los_Angeles
2 | 2025-08-28 14:00:00 | UTC
3 | 2025-08-30 23:30:00 | America/New_York
orders
order_id | user_id | order_ts | amount | status
101 | 1 | 2025-08-27 23:55:00| 20.00 | completed
102 | 1 | 2025-08-28 00:05:00| 15.00 | canceled
103 | 2 | 2025-08-31 12:00:00| 9.99 | completed
104 | 2 | 2025-09-01 00:00:10| 49.00 | completed
105 | 3 | 2025-08-26 23:59:59| 5.00 | completed
events
event_id | user_id | event_ts | event_type | product_id | device_id
1 | 1 | 2025-08-27 23:50:00| view_product | 555 | A
2 | 1 | 2025-08-27 23:50:00| view_product | 555 | A -- duplicate of 1
3 | 1 | 2025-08-28 00:03:00| add_to_cart | 555 | A
4 | 1 | 2025-08-28 00:04:00| purchase | 555 | A
5 | 2 | 2025-08-31 11:58:00| view_product | 777 | B
6 | 3 | 2025-08-26 23:55:00| view_product | 999 | C
7 | 3 | 2025-08-26 23:58:00| purchase | 999 | C
8 | 2 | 2025-09-01 00:00:10| purchase | 777 | B
Tasks (write a single Standard SQL query; CTEs allowed):
1) At the day level (UTC days), for each day in the 7-day window, compute: unique_viewers = COUNT(DISTINCT user_id with >=1 view_product event that day after deduplicating exact duplicate events), purchasers = COUNT of completed orders that day (exclude canceled), and conversion_rate = purchasers / NULLIF(unique_viewers,0).
2) At the user level, within the same window, return for every user_id: first_completed_order_ts, last_completed_order_ts (NULL if none), and total_completed_orders. Use window functions, not correlated subqueries.
3) Event deduplication: treat rows with identical (user_id, event_ts, event_type, product_id, device_id) as duplicates; keep only one (e.g., ROW_NUMBER over these keys).
4) Output two result sets: (a) daily metrics ordered by day; (b) user-level metrics ordered by user_id.
Bonus: Provide a concise Pandas solution sketch achieving (1)-(3) with correct time windowing and deduplication.
Quick Answer: This question evaluates proficiency in time-windowed aggregations, event deduplication, and window-function–based user-level analytics across SQL and Pandas workflows.