How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

What difficulty level is this interview question?

This is a Medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Snowflake.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Snowflake during technical interviews.

Query seven-day conversion with windows and dedupe

Company: Snowflake

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Assume all timestamps are UTC. Treat "today" as 2025-09-01. Define "last 7 days" as the inclusive window [2025-08-26 00:00:00, 2025-09-01 23:59:59]. Schema (invented): users(user_id INT, signup_at TIMESTAMP, tz STRING) orders(order_id INT, user_id INT, order_ts TIMESTAMP, amount DECIMAL(10,2), status STRING) -- status in ('completed','canceled') events(event_id INT, user_id INT, event_ts TIMESTAMP, event_type STRING, product_id INT, device_id STRING) Sample rows: users user_id | signup_at | tz 1 | 2025-08-10 09:00:00 | America/Los_Angeles 2 | 2025-08-28 14:00:00 | UTC 3 | 2025-08-30 23:30:00 | America/New_York orders order_id | user_id | order_ts | amount | status 101 | 1 | 2025-08-27 23:55:00| 20.00 | completed 102 | 1 | 2025-08-28 00:05:00| 15.00 | canceled 103 | 2 | 2025-08-31 12:00:00| 9.99 | completed 104 | 2 | 2025-09-01 00:00:10| 49.00 | completed 105 | 3 | 2025-08-26 23:59:59| 5.00 | completed events event_id | user_id | event_ts | event_type | product_id | device_id 1 | 1 | 2025-08-27 23:50:00| view_product | 555 | A 2 | 1 | 2025-08-27 23:50:00| view_product | 555 | A -- duplicate of 1 3 | 1 | 2025-08-28 00:03:00| add_to_cart | 555 | A 4 | 1 | 2025-08-28 00:04:00| purchase | 555 | A 5 | 2 | 2025-08-31 11:58:00| view_product | 777 | B 6 | 3 | 2025-08-26 23:55:00| view_product | 999 | C 7 | 3 | 2025-08-26 23:58:00| purchase | 999 | C 8 | 2 | 2025-09-01 00:00:10| purchase | 777 | B Tasks (write a single Standard SQL query; CTEs allowed): 1) At the day level (UTC days), for each day in the 7-day window, compute: unique_viewers = COUNT(DISTINCT user_id with >=1 view_product event that day after deduplicating exact duplicate events), purchasers = COUNT of completed orders that day (exclude canceled), and conversion_rate = purchasers / NULLIF(unique_viewers,0). 2) At the user level, within the same window, return for every user_id: first_completed_order_ts, last_completed_order_ts (NULL if none), and total_completed_orders. Use window functions, not correlated subqueries. 3) Event deduplication: treat rows with identical (user_id, event_ts, event_type, product_id, device_id) as duplicates; keep only one (e.g., ROW_NUMBER over these keys). 4) Output two result sets: (a) daily metrics ordered by day; (b) user-level metrics ordered by user_id. Bonus: Provide a concise Pandas solution sketch achieving (1)-(3) with correct time windowing and deduplication.

Quick Answer: This question evaluates proficiency in time-windowed aggregations, event deduplication, and window-function–based user-level analytics across SQL and Pandas workflows.

orders order_id | user_id | order_ts | amount | status 101 | 1 | 2025-08-27 23:55:00| 20.00 | completed 102 | 1 | 2025-08-28 00:05:00| 15.00 | canceled 103 | 2 | 2025-08-31 12:00:00| 9.99 | completed 104 | 2 | 2025-09-01 00:00:10| 49.00 | completed 105 | 3 | 2025-08-26 23:59:59| 5.00 | completed

events event_id | user_id | event_ts | event_type | product_id | device_id 1 | 1 | 2025-08-27 23:50:00| view_product | 555 | A 2 | 1 | 2025-08-27 23:50:00| view_product | 555 | A -- duplicate of 1 3 | 1 | 2025-08-28 00:03:00| add_to_cart | 555 | A 4 | 1 | 2025-08-28 00:04:00| purchase | 555 | A 5 | 2 | 2025-08-31 11:58:00| view_product | 777 | B 6 | 3 | 2025-08-26 23:55:00| view_product | 999 | C 7 | 3 | 2025-08-26 23:58:00| purchase | 999 | C 8 | 2 | 2025-09-01 00:00:10| purchase | 777 | B Tasks (write a single Standard SQL query; CTEs allowed):

At the day level (UTC days), for each day in the 7-day window, compute: unique_viewers = COUNT(DISTINCT user_id with >=1 view_product event that day after deduplicating exact duplicate events), purchasers = COUNT of completed orders that day (exclude canceled), and conversion_rate = purchasers / NULLIF(unique_viewers,0).
At the user level, within the same window, return for every user_id: first_completed_order_ts, last_completed_order_ts (NULL if none), and total_completed_orders. Use window functions, not correlated subqueries.
Event deduplication: treat rows with identical (user_id, event_ts, event_type, product_id, device_id) as duplicates; keep only one (e.g., ROW_NUMBER over these keys).
Output two result sets: (a) daily metrics ordered by day; (b) user-level metrics ordered by user_id. Bonus: Provide a concise Pandas solution sketch achieving (1)-(3) with correct time windowing and deduplication.

Query seven-day conversion with windows and dedupe

Company: Snowflake

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Quick Answer: This question evaluates proficiency in time-windowed aggregations, event deduplication, and window-function–based user-level analytics across SQL and Pandas workflows.

At the day level (UTC days), for each day in the 7-day window, compute: unique_viewers = COUNT(DISTINCT user_id with >=1 view_product event that day after deduplicating exact duplicate events), purchasers = COUNT of completed orders that day (exclude canceled), and conversion_rate = purchasers / NULLIF(unique_viewers,0).
At the user level, within the same window, return for every user_id: first_completed_order_ts, last_completed_order_ts (NULL if none), and total_completed_orders. Use window functions, not correlated subqueries.
Event deduplication: treat rows with identical (user_id, event_ts, event_type, product_id, device_id) as duplicates; keep only one (e.g., ROW_NUMBER over these keys).
Output two result sets: (a) daily metrics ordered by day; (b) user-level metrics ordered by user_id. Bonus: Provide a concise Pandas solution sketch achieving (1)-(3) with correct time windowing and deduplication.

Query seven-day conversion with windows and dedupe

Quick Overview

Comments (0)

Query seven-day conversion with windows and dedupe

Quick Overview

Comments (0)