
Given the following schema and small samples, answer Parts A–C. Assume timestamps are UTC and "today" is 2025-09-01. Schema: users(id INT, country STRING, hire_date DATE, visa_status STRING, is_remote BOOLEAN); sessions(session_id INT, user_id INT, session_start TIMESTAMP, session_end TIMESTAMP, device STRING); events(event_id INT, session_id INT, user_id INT, event_time TIMESTAMP, event_type STRING, metadata JSON). Sample tables: users +----+---------+------------+------------+-----------+ | id | country | hire_date | visa_status| is_remote | +----+---------+------------+------------+-----------+ | 1 | US | 2025-08-15 | H1B | 1 | | 2 | CN | 2024-12-01 | GC | 0 | | 3 | IN | 2025-06-01 | H1B | 1 | +----+---------+------------+------------+-----------+
sessions +------------+---------+---------------------+---------------------+--------+ | session_id | user_id | session_start | session_end | device | +------------+---------+---------------------+---------------------+--------+ | 10 | 1 | 2025-08-31 09:00:00 | 2025-08-31 09:30:00 | web | | 11 | 1 | 2025-09-01 10:00:00 | 2025-09-01 10:07:00 | mobile | | 12 | 2 | 2025-09-01 11:00:00 | 2025-09-01 11:20:00 | web | | 13 | 3 | 2025-08-25 12:00:00 | 2025-08-25 12:10:00 | web | +------------+---------+---------------------+---------------------+--------+
events +----------+------------+---------+---------------------+------------+------------------------+ | event_id | session_id | user_id | event_time | event_type | metadata | +----------+------------+---------+---------------------+------------+------------------------+ | 100 | 10 | 1 | 2025-08-31 09:01:00 | login | {} | | 101 | 10 | 1 | 2025-08-31 09:05:00 | view | {"page":"pricing"} | | 102 | 10 | 1 | 2025-08-31 09:10:00 | purchase | {"amount":100} | | 103 | 11 | 1 | 2025-09-01 10:01:00 | login | {} | | 104 | 11 | 1 | 2025-09-01 10:02:00 | view | {"page":"home"} | | 105 | 12 | 2 | 2025-09-01 11:05:00 | login | {} | | 106 | 12 | 2 | 2025-09-01 11:10:00 | purchase | {"amount":50} | | 107 | 13 | 3 | 2025-08-25 12:05:00 | login | {} | +----------+------------+---------+---------------------+------------+------------------------+ Part A (SQL): For each date d in [2025-08-26, 2025-09-01], produce one row with: date d; distinct users who started a session that day; distinct users with a login event that day; distinct users with a purchase event that day where the purchase event’s (user_id, session_id, event_time) falls within that user’s session window on day d; conversion rates login→purchase and session→purchase (both as decimals). Treat multiple events by the same user on the same day as one. Ignore events that do not match both user_id and session_id or are outside the session time window. Part B (SQL): For each user, compute first_login_date and first_purchase_date (based only on events that match a valid session window as above) and output a within_7_days flag indicating whether first_purchase_date occurs within 7 days inclusive of first_login_date. Part C (Python): Using pandas DataFrames with the same schemas, implement a function retention_1d(as_of="2025-09-01") that returns the percentage of users whose first_login_date is in [2025-08-25, 2025-09-01] and who have any valid event (matching a session window) on the day exactly one calendar day after their first_login_date. Clearly document how you handle duplicate or orphaned events and time boundaries.