Given two tables, compute control vs treatment (C/T) metrics, apply 24‑hour attribution, and generate a daily plot. Treat “today” as 2025‑09‑01; use the last 7 days window 2025‑08‑26 through 2025‑09‑01 (inclusive). Provide either SQL or Python, and explain any assumptions.
Schema:
-
visits(visit_id STRING, user_id STRING, visit_ts TIMESTAMP UTC, market_id STRING, channel STRING, variant CHAR(1) in {"C","T"})
-
bookings(booking_id STRING, user_id STRING, booking_ts TIMESTAMP UTC, status STRING in {"confirmed","cancelled"}, revenue_usd DECIMAL(10,2))
Attribution rule: attribute each booking to the most recent visit by the same user within the prior 24 hours; otherwise leave unattributed. Deduplicate visits within 60 seconds per user_id by keeping only the latest. Exclude cancelled bookings from conversion and revenue; keep a diagnostic count of cancellations. Ignore bookings without an attributed visit for conversion, but report their count separately.
Required outputs for 2025‑08‑26..2025‑09‑01 by variant and by day:
-
visits, unique_users, attributed_bookings (confirmed), conversion_rate = attributed_bookings / visits, total_revenue_usd (confirmed only), revenue_per_visit, cancellations, unattributed_confirmed_bookings.
-
A line plot of daily conversion_rate with 95% Wilson CIs for C and T.
Edge cases to handle: a user with C and T visits in the window; multiple bookings after one visit; booking outside the 24‑hour window; duplicate visits within 60 seconds.
Small sample data (ASCII) to clarify shapes (not exhaustive):
visits
+----------+---------+---------------------+-----------+---------+---------+
| visit_id | user_id | visit_ts | market_id | channel | variant |
+----------+---------+---------------------+-----------+---------+---------+
| v1 | u101 | 2025-08-26 09:05:00 | NYC | seo | C |
| v2 | u101 | 2025-08-26 20:10:00 | NYC | direct | T |
| v3 | u102 | 2025-08-27 10:00:00 | SFO | paid | C |
| v4 | u103 | 2025-08-27 23:50:00 | NYC | email | T |
| v5 | u104 | 2025-08-28 00:10:00 | NYC | seo | T |
| v6 | u105 | 2025-08-28 12:40:00 | BOS | seo | C |
| v7 | u106 | 2025-08-30 21:55:00 | NYC | direct | T |
| v8 | u106 | 2025-08-30 21:56:00 | NYC | direct | T |
| v9 | u107 | 2025-09-01 09:30:00 | BOS | paid | C |
+----------+---------+---------------------+-----------+---------+---------+
bookings
+-------------+---------+---------------------+-----------+-------------+
| booking_id | user_id | booking_ts | status | revenue_usd |
+-------------+---------+---------------------+-----------+-------------+
| b1 | u101 | 2025-08-26 21:00:00 | confirmed | 220.00 |
| b2 | u102 | 2025-08-27 12:00:00 | confirmed | 150.00 |
| b3 | u103 | 2025-08-28 00:30:00 | confirmed | 300.00 |
| b4 | u104 | 2025-08-29 04:00:00 | confirmed | 180.00 |
| b5 | u106 | 2025-08-31 22:10:00 | confirmed | 250.00 |
| b6 | u105 | 2025-08-30 13:00:00 | cancelled | 200.00 |
| b7 | u108 | 2025-08-29 14:00:00 | confirmed | 120.00 |
+-------------+---------+---------------------+-----------+-------------+
Tasks:
A) Recompute attribution from visits using the 24‑hour rule and deduped visits; produce the required metrics by variant and by day.
B) Output a tidy table with one row per (date, variant) and the metrics above.
C) Produce the described plot.