You are given the following schema (timestamps are UTC):
users(user_id INT, country STRING, created_at TIMESTAMP)
events(user_id INT, event_ts TIMESTAMP, event_type STRING) -- only app_open rows are relevant
orders(order_id INT, user_id INT, order_ts TIMESTAMP, status STRING) -- status in ('placed','canceled')
payments(payment_id INT, order_id INT, amount DECIMAL(10,2), payment_ts TIMESTAMP, is_valid BOOL)
refunds(refund_id INT, order_id INT, amount DECIMAL(10,2), refund_ts TIMESTAMP)
Small sample data:
users
+---------+---------+---------------------+
| user_id | country | created_at |
+---------+---------+---------------------+
| 1 | US | 2025-06-20 00:00:00 |
| 2 | US | 2025-07-10 00:00:00 |
| 3 | IN | 2025-08-02 00:00:00 |
| 4 | US | 2025-08-25 00:00:00 |
| 5 | US | 2025-07-28 00:00:00 |
| 6 | BR | 2025-08-15 00:00:00 |
+---------+---------+---------------------+
events (only app_open shown)
+---------+---------------------+-----------+
| user_id | event_ts | event_type|
+---------+---------------------+-----------+
| 1 | 2025-08-05 10:00:00 | app_open |
| 2 | 2025-08-10 09:00:00 | app_open |
| 2 | 2025-08-28 13:00:00 | app_open |
| 3 | 2025-08-03 08:30:00 | app_open |
| 4 | 2025-08-30 20:10:00 | app_open |
| 5 | 2025-08-12 12:00:00 | app_open |
| 6 | 2025-08-31 23:55:00 | app_open |
+---------+---------------------+-----------+
orders
+----------+---------+---------------------+----------+
| order_id | user_id | order_ts | status |
+----------+---------+---------------------+----------+
| 101 | 1 | 2025-08-05 10:05:00 | placed |
| 102 | 2 | 2025-08-10 09:05:00 | placed |
| 103 | 2 | 2025-08-28 13:05:00 | canceled |
| 104 | 3 | 2025-08-03 08:35:00 | placed |
| 105 | 4 | 2025-08-30 20:15:00 | placed |
| 106 | 5 | 2025-08-12 12:05:00 | placed |
| 107 | 6 | 2025-08-31 23:58:00 | placed |
+----------+---------+---------------------+----------+
payments
+------------+----------+--------+---------------------+----------+
| payment_id | order_id | amount | payment_ts | is_valid |
+------------+----------+--------+---------------------+----------+
| 201 | 101 | 50.00 | 2025-08-05 10:06:00 | 1 |
| 202 | 102 | 20.00 | 2025-08-10 09:06:00 | 1 |
| 203 | 103 | 15.00 | 2025-08-28 13:06:00 | 1 |
| 204 | 103 | 15.00 | 2025-08-28 13:06:00 | 0 | -- duplicate/invalid
| 205 | 104 | 30.00 | 2025-09-01 00:01:00 | 1 | -- late payment (Sep)
| 206 | 105 | 100.00 | 2025-08-30 20:16:00 | 1 |
| 207 | 106 | 40.00 | 2025-08-12 12:06:00 | 1 |
| 208 | 107 | 25.00 | 2025-09-01 00:10:00 | 1 | -- late payment (Sep)
+------------+----------+--------+---------------------+----------+
refunds
+-----------+----------+--------+---------------------+
| refund_id | order_id | amount | refund_ts |
+-----------+----------+--------+---------------------+
| 301 | 103 | 15.00 | 2025-08-29 10:00:00 |
| 302 | 105 | 100.00 | 2025-09-02 09:00:00 | -- refund after month-end
| 303 | 106 | 10.00 | 2025-08-20 12:00:00 |
+-----------+----------+--------+---------------------+
Task (one Standard SQL query): For calendar month 2025-08, output one row per signup cohort month (cohort_month = DATE_TRUNC(created_at, MONTH)) with: cohort_month, active_users_aug, payers_aug, payer_rate_aug, gmv_aug_usd. Rules and edge cases to handle precisely:
-
Active users: distinct users in the cohort with at least one events.event_type = 'app_open' during 2025-08-01 to 2025-08-31 inclusive.
-
GMV (gmv_aug_usd): sum of payments.amount where payments.is_valid = 1 and payment_ts in 2025-08, minus sum of refunds.amount where refund_ts in 2025-08, regardless of order status or order_ts. Do not include invalid payments (is_valid=0). GMV may be negative.
-
Payers (payers_aug): among active users in the cohort, count distinct users whose net_august_amount = (valid August payments tied to their orders) minus (August refunds tied to their orders) is strictly > 0. A canceled order with payment and same-month full refund should not count as a payer.
-
Denominator zero: if active_users_aug = 0, return payer_rate_aug = NULL (not 0).
-
Ignore rows with NULL user_id anywhere.
-
Late/early timing: payments/refunds outside August must not affect August metrics, even if the related order_ts is in August.
Return the exact SQL that produces the specified output.