Today is 2025-09-01. Using SQL (optionally outline a pandas approach too), compute daily engagement and D1 retention for the last 7 days (2025-08-26 through 2025-09-01). Exclude any user_id beginning with 'bot_'. Use the following schema and sample data.
Schema: users(user_id STRING, signup_date DATE, country STRING) events(user_id STRING, event_time TIMESTAMP, event_type STRING, product_id STRING) purchases(user_id STRING, purchase_time TIMESTAMP, order_id STRING, amount DECIMAL(10,2))
Sample tables (UTC times): users | user_id | signup_date | country | | u1 | 2025-08-25 | US | | u2 | 2025-08-26 | US | | u3 | 2025-08-26 | IN | | u4 | 2025-08-31 | US | | u5 | 2025-09-01 | US | | bot_1 | 2025-08-26 | US |
events | user_id | event_time | event_type | product_id | | u1 | 2025-08-26 09:00:00 | app_open | NULL | | u1 | 2025-08-27 10:00:00 | app_open | NULL | | u2 | 2025-08-26 12:00:00 | app_open | NULL | | u2 | 2025-08-27 13:00:00 | view | P1 | | u2 | 2025-08-28 12:00:00 | app_open | NULL | | u3 | 2025-08-26 15:00:00 | app_open | NULL | | u3 | 2025-08-27 16:00:00 | app_open | NULL | | u4 | 2025-09-01 08:00:00 | app_open | NULL | | u5 | 2025-09-01 09:00:00 | app_open | NULL | | bot_1 | 2025-08-26 10:00:00 | app_open | NULL |
purchases | user_id | purchase_time | order_id | amount | | u1 | 2025-08-27 11:00:00 | o1 | 9.99 | | u2 | 2025-08-28 11:30:00 | o2 | 4.00 | | u3 | 2025-08-29 14:00:00 | o3 | 2.50 | | u4 | 2025-09-01 09:30:00 | o4 | 1.00 | | u5 | 2025-09-02 10:00:00 | o5 | 5.00 | | bot_1 | 2025-08-26 10:30:00 | o6 | 100.00 |
Task: Produce a single SQL query returning one row per day d with the following columns: (1) day (DATE), (2) dau: distinct users with any event on day d, (3) new_users: users whose first-ever event_time is on day d, (4) d1_retention_rate: among new_users on day d, fraction with any event on day d+1, (5) revenue_per_dau: total purchase amount on day d divided by dau. Notes: treat days in UTC; deduplicate exact-duplicate events via SELECT DISTINCT user_id, event_time, event_type, product_id before aggregating; exclude 'bot_%' users from all metrics. Provide the SQL and, briefly, how you would compute the same using pandas (high-level steps).