You are given three tables. Write one SQL script (CTEs allowed) that answers all parts using window functions and joins (no procedural loops):
Schema:
-
users(user_id INT, signup_date DATE, channel VARCHAR)
-
sessions(session_id INT, user_id INT, session_start TIMESTAMP, country VARCHAR)
-
orders(order_id INT, user_id INT, order_ts TIMESTAMP, amount DECIMAL(10,2))
Sample data:
users
user_id | signup_date | channel
1 | 2025-08-01 | Ads
2 | 2025-08-03 | Organic
3 | 2025-08-05 | Ads
4 | 2025-08-28 | Referral
sessions
session_id | user_id | session_start | country
10 | 1 | 2025-08-02 10:00:00 | US
11 | 1 | 2025-08-08 09:12:00 | US
12 | 2 | 2025-08-04 12:30:00 | US
13 | 3 | 2025-08-06 17:45:00 | CA
14 | 4 | 2025-09-01 00:10:00 | US
orders
order_id | user_id | order_ts | amount
100 | 1 | 2025-08-09 15:00:00 | 20.00
101 | 3 | 2025-08-07 19:00:00 | 10.00
102 | 3 | 2025-09-01 01:00:00 | 5.00
Tasks:
A) For each calendar day D in 2025-08-01..2025-08-31, compute 7-day rolling retention: among users with signup_date <= D, the fraction who had ≥1 session in [D-6, D] (inclusive). Output columns: day, retained_users, eligible_users, retention_rate. Use window functions where appropriate; ensure correct date truncation from session_start.
B) For each user, output first_purchase_date, days_to_first_purchase (from signup_date), and total_amount_before_first_purchase_window (sum of orders strictly before first_purchase_date should be 0 by definition; prove it via ROW_NUMBER/QUALIFY or equivalent instead of MIN subqueries). Show user_id, first_purchase_date, days_to_first_purchase.
C) As of today (2025-09-01), compute top 3 acquisition channels by 7-day ARPU over [2025-08-26, 2025-09-01]: ARPU = total order amount in the window divided by number of active users (users with ≥1 session in the window) from that channel. Output: channel, active_users_7d, revenue_7d, arpu_7d; order by arpu_7d desc and limit 3.
D) Edge cases to handle correctly in your query: users with no sessions; multiple sessions same day; multiple orders on same day; users who signed up after the window; time zones (assume all timestamps are UTC and day boundaries are UTC). Explain briefly in comments where each is handled.
Deliverables: a single SQL script using CTEs and window functions (e.g., ROW_NUMBER, SUM OVER, COUNT DISTINCT via windowing or equivalent) that produces the specified outputs.