PostgreSQL. Today is 2025-09-01. You’re given the following schema and toy samples:
users(user_id INT, country TEXT, signup_date DATE, marketing_channel TEXT) +---------+---------+-------------+------------------+ | user_id | country | signup_date | marketing_channel| +---------+---------+-------------+------------------+ | 1 | US | 2025-08-12 | email | | 2 | CA | 2025-08-13 | paid | | 3 | US | 2025-08-11 | organic | | 4 | GB | 2025-08-17 | paid | +---------+---------+-------------+------------------+
exposures(user_id INT, experiment_id TEXT, variant TEXT, exposure_ts TIMESTAMP, source TEXT) +---------+--------------------+---------+---------------------+--------+ | user_id | experiment_id | variant | exposure_ts | source | +---------+--------------------+---------+---------------------+--------+ | 1 | tracker_vlaunch_v1 | control | 2025-08-12 10:00 | web | | 2 | tracker_vlaunch_v1 | B | 2025-08-13 09:35 | email | | 2 | tracker_vlaunch_v1 | control | 2025-08-14 10:18 | web | <- crossover | 3 | tracker_vlaunch_v1 | B | 2025-08-11 08:02 | paid | | 4 | tracker_vlaunch_v1 | B | 2025-08-17 15:10 | web | +---------+--------------------+---------+---------------------+--------+
sessions(user_id INT, session_start TIMESTAMP, device TEXT, is_bot BOOLEAN) +---------+---------------------+---------+--------+ | user_id | session_start | device | is_bot | +---------+---------------------+---------+--------+ | 1 | 2025-08-12 10:00 | ios | false | | 2 | 2025-08-13 09:30 | web | false | | 2 | 2025-08-14 10:10 | web | true | | 3 | 2025-08-11 08:00 | android | false | | 4 | 2025-08-17 15:00 | web | false | +---------+---------------------+---------+--------+
orders(order_id INT, user_id INT, order_ts TIMESTAMP, revenue NUMERIC(10,2), refunded BOOLEAN) +----------+---------+---------------------+---------+----------+ | order_id | user_id | order_ts | revenue | refunded | +----------+---------+---------------------+---------+----------+ | 501 | 1 | 2025-08-20 12:00 | 99.00 | false | | 502 | 2 | 2025-08-28 11:05 | 59.00 | true | | 503 | 3 | 2025-08-13 09:00 | 49.00 | false | | 504 | 4 | 2025-08-30 13:00 | 129.00 | false | +----------+---------+---------------------+---------+----------+
Task A — core metrics (CTEs allowed): For experiment_id = 'tracker_vlaunch_v1' and exposures between 2025-08-01 and 2025-08-24, write a single query that:
Task B — SRM diagnostic: Write a second query that tests for sample ratio mismatch across variants among qualified users using a chi-square goodness-of-fit test vs. expected 50/50. Output observed_count_control, observed_count_B, expected_each, chi2_stat, and an approximate p_value (you may use built-in ln(), exp(), and power() but no UDFs).
Edge cases to handle: users with no sessions; users with only bot sessions; users with orders before exposure; multiple orders within 14 days; users outside US/CA; and the crossover illustrated for user_id=2.