Today is 2025-09-01. Using SQL (optionally outline a pandas approach too), compute daily engagement and D1 retention for the last 7 days (2025-08-26 through 2025-09-01). Exclude any user_id beginning with 'bot_'. Use the following schema and sample data.
Schema:
users(user_id STRING, signup_date DATE, country STRING)
events(user_id STRING, event_time TIMESTAMP, event_type STRING, product_id STRING)
purchases(user_id STRING, purchase_time TIMESTAMP, order_id STRING, amount DECIMAL(10,2))
Sample tables (UTC times):
users
| user_id | signup_date | country |
| u1 | 2025-08-25 | US |
| u2 | 2025-08-26 | US |
| u3 | 2025-08-26 | IN |
| u4 | 2025-08-31 | US |
| u5 | 2025-09-01 | US |
| bot_1 | 2025-08-26 | US |
events
| user_id | event_time | event_type | product_id |
| u1 | 2025-08-26 09:00:00 | app_open | NULL |
| u1 | 2025-08-27 10:00:00 | app_open | NULL |
| u2 | 2025-08-26 12:00:00 | app_open | NULL |
| u2 | 2025-08-27 13:00:00 | view | P1 |
| u2 | 2025-08-28 12:00:00 | app_open | NULL |
| u3 | 2025-08-26 15:00:00 | app_open | NULL |
| u3 | 2025-08-27 16:00:00 | app_open | NULL |
| u4 | 2025-09-01 08:00:00 | app_open | NULL |
| u5 | 2025-09-01 09:00:00 | app_open | NULL |
| bot_1 | 2025-08-26 10:00:00 | app_open | NULL |
purchases
| user_id | purchase_time | order_id | amount |
| u1 | 2025-08-27 11:00:00 | o1 | 9.99 |
| u2 | 2025-08-28 11:30:00 | o2 | 4.00 |
| u3 | 2025-08-29 14:00:00 | o3 | 2.50 |
| u4 | 2025-09-01 09:30:00 | o4 | 1.00 |
| u5 | 2025-09-02 10:00:00 | o5 | 5.00 |
| bot_1 | 2025-08-26 10:30:00 | o6 | 100.00 |
Task: Produce a single SQL query returning one row per day d with the following columns: (1) day (DATE), (2) dau: distinct users with any event on day d, (3) new_users: users whose first-ever event_time is on day d, (4) d1_retention_rate: among new_users on day d, fraction with any event on day d+1, (5) revenue_per_dau: total purchase amount on day d divided by dau. Notes: treat days in UTC; deduplicate exact-duplicate events via SELECT DISTINCT user_id, event_time, event_type, product_id before aggregating; exclude 'bot_%' users from all metrics. Provide the SQL and, briefly, how you would compute the same using pandas (high-level steps).