PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Snowflake

Query seven-day conversion with windows and dedupe

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in time-windowed aggregations, event deduplication, and window-function–based user-level analytics across SQL and Pandas workflows.

  • Medium
  • Snowflake
  • Data Manipulation (SQL/Python)
  • Data Scientist

Query seven-day conversion with windows and dedupe

Company: Snowflake

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Assume all timestamps are UTC. Treat "today" as 2025-09-01. Define "last 7 days" as the inclusive window [2025-08-26 00:00:00, 2025-09-01 23:59:59]. Schema (invented): users(user_id INT, signup_at TIMESTAMP, tz STRING) orders(order_id INT, user_id INT, order_ts TIMESTAMP, amount DECIMAL(10,2), status STRING) -- status in ('completed','canceled') events(event_id INT, user_id INT, event_ts TIMESTAMP, event_type STRING, product_id INT, device_id STRING) Sample rows: users user_id | signup_at | tz 1 | 2025-08-10 09:00:00 | America/Los_Angeles 2 | 2025-08-28 14:00:00 | UTC 3 | 2025-08-30 23:30:00 | America/New_York orders order_id | user_id | order_ts | amount | status 101 | 1 | 2025-08-27 23:55:00| 20.00 | completed 102 | 1 | 2025-08-28 00:05:00| 15.00 | canceled 103 | 2 | 2025-08-31 12:00:00| 9.99 | completed 104 | 2 | 2025-09-01 00:00:10| 49.00 | completed 105 | 3 | 2025-08-26 23:59:59| 5.00 | completed events event_id | user_id | event_ts | event_type | product_id | device_id 1 | 1 | 2025-08-27 23:50:00| view_product | 555 | A 2 | 1 | 2025-08-27 23:50:00| view_product | 555 | A -- duplicate of 1 3 | 1 | 2025-08-28 00:03:00| add_to_cart | 555 | A 4 | 1 | 2025-08-28 00:04:00| purchase | 555 | A 5 | 2 | 2025-08-31 11:58:00| view_product | 777 | B 6 | 3 | 2025-08-26 23:55:00| view_product | 999 | C 7 | 3 | 2025-08-26 23:58:00| purchase | 999 | C 8 | 2 | 2025-09-01 00:00:10| purchase | 777 | B Tasks (write a single Standard SQL query; CTEs allowed): 1) At the day level (UTC days), for each day in the 7-day window, compute: unique_viewers = COUNT(DISTINCT user_id with >=1 view_product event that day after deduplicating exact duplicate events), purchasers = COUNT of completed orders that day (exclude canceled), and conversion_rate = purchasers / NULLIF(unique_viewers,0). 2) At the user level, within the same window, return for every user_id: first_completed_order_ts, last_completed_order_ts (NULL if none), and total_completed_orders. Use window functions, not correlated subqueries. 3) Event deduplication: treat rows with identical (user_id, event_ts, event_type, product_id, device_id) as duplicates; keep only one (e.g., ROW_NUMBER over these keys). 4) Output two result sets: (a) daily metrics ordered by day; (b) user-level metrics ordered by user_id. Bonus: Provide a concise Pandas solution sketch achieving (1)-(3) with correct time windowing and deduplication.

Quick Answer: This question evaluates proficiency in time-windowed aggregations, event deduplication, and window-function–based user-level analytics across SQL and Pandas workflows.

Related Interview Questions

  • Design an analytic warehouse for event data - Snowflake (Medium)
  • Build a cohort dashboard with Streamlit and SQL - Snowflake (Medium)
Snowflake logo
Snowflake
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
1
0

Assume all timestamps are UTC. Treat "today" as 2025-09-01. Define "last 7 days" as the inclusive window [2025-08-26 00:00:00, 2025-09-01 23:59:59]. Schema (invented): users(user_id INT, signup_at TIMESTAMP, tz STRING) orders(order_id INT, user_id INT, order_ts TIMESTAMP, amount DECIMAL(10,2), status STRING) -- status in ('completed','canceled') events(event_id INT, user_id INT, event_ts TIMESTAMP, event_type STRING, product_id INT, device_id STRING) Sample rows: users user_id | signup_at | tz 1 | 2025-08-10 09:00:00 | America/Los_Angeles 2 | 2025-08-28 14:00:00 | UTC 3 | 2025-08-30 23:30:00 | America/New_York

orders order_id | user_id | order_ts | amount | status 101 | 1 | 2025-08-27 23:55:00| 20.00 | completed 102 | 1 | 2025-08-28 00:05:00| 15.00 | canceled 103 | 2 | 2025-08-31 12:00:00| 9.99 | completed 104 | 2 | 2025-09-01 00:00:10| 49.00 | completed 105 | 3 | 2025-08-26 23:59:59| 5.00 | completed

events event_id | user_id | event_ts | event_type | product_id | device_id 1 | 1 | 2025-08-27 23:50:00| view_product | 555 | A 2 | 1 | 2025-08-27 23:50:00| view_product | 555 | A -- duplicate of 1 3 | 1 | 2025-08-28 00:03:00| add_to_cart | 555 | A 4 | 1 | 2025-08-28 00:04:00| purchase | 555 | A 5 | 2 | 2025-08-31 11:58:00| view_product | 777 | B 6 | 3 | 2025-08-26 23:55:00| view_product | 999 | C 7 | 3 | 2025-08-26 23:58:00| purchase | 999 | C 8 | 2 | 2025-09-01 00:00:10| purchase | 777 | B Tasks (write a single Standard SQL query; CTEs allowed):

  1. At the day level (UTC days), for each day in the 7-day window, compute: unique_viewers = COUNT(DISTINCT user_id with >=1 view_product event that day after deduplicating exact duplicate events), purchasers = COUNT of completed orders that day (exclude canceled), and conversion_rate = purchasers / NULLIF(unique_viewers,0).
  2. At the user level, within the same window, return for every user_id: first_completed_order_ts, last_completed_order_ts (NULL if none), and total_completed_orders. Use window functions, not correlated subqueries.
  3. Event deduplication: treat rows with identical (user_id, event_ts, event_type, product_id, device_id) as duplicates; keep only one (e.g., ROW_NUMBER over these keys).
  4. Output two result sets: (a) daily metrics ordered by day; (b) user-level metrics ordered by user_id. Bonus: Provide a concise Pandas solution sketch achieving (1)-(3) with correct time windowing and deduplication.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Snowflake•More Data Scientist•Snowflake Data Scientist•Snowflake Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.