You are analyzing a Disney/Hulu product funnel with events captured in a single table. Write ANSI-SQL (Snowflake/BigQuery compatible) to answer the sub-questions. Use 'today' = 2025-09-01.
Schema
Small sample tables users +---------+---------------------+---------+----------+ | user_id | created_at | country | platform | +---------+---------------------+---------+----------+ | u1 | 2025-08-10 12:00:00 | US | iOS | | u2 | 2025-08-25 09:00:00 | US | Web | | u3 | 2025-08-30 08:30:00 | CA | Android | | u4 | 2025-08-31 21:00:00 | US | iOS | +---------+---------------------+---------+----------+
events +---------------------+--------+------------------+--------+-----------+---------------+-------------+ | event_ts | user_id| event_type | show_id| session_id| revenue_cents | campaign_id | +---------------------+--------+------------------+--------+-----------+---------------+-------------+ | 2025-08-26 10:01:00 | u1 | impression | s1 | sA | NULL | c10 | | 2025-08-26 10:02:00 | u1 | click | s1 | sA | NULL | c10 | | 2025-08-26 10:05:00 | u1 | signup | NULL | sA | NULL | NULL | | 2025-08-27 11:00:00 | u1 | start_subscription| NULL | sA | 7999 | NULL | | 2025-08-30 09:00:00 | u2 | impression | s2 | sB | NULL | c10 | | 2025-08-30 09:01:00 | u2 | click | s2 | sB | NULL | c10 | | 2025-08-31 12:00:00 | u2 | signup | NULL | sB | NULL | NULL | | 2025-09-01 08:00:00 | u3 | impression | s1 | sC | NULL | c99 | | 2025-09-01 08:00:02 | u3 | impression | s1 | sC | NULL | c99 | | 2025-09-01 08:03:00 | u3 | click | s1 | sC | NULL | c99 | | 2025-09-01 08:20:00 | u3 | watch_start | s1 | sC | NULL | NULL | | 2025-09-01 08:50:00 | u3 | watch_end | s1 | sC | NULL | NULL | | 2025-08-31 20:00:00 | u4 | impression | s3 | sD | NULL | NULL | | 2025-09-01 07:30:00 | u4 | click | s3 | sD | NULL | NULL | | 2025-09-01 07:40:00 | u4 | signup | NULL | sD | NULL | NULL | +---------------------+--------+------------------+--------+-----------+---------------+-------------+
shows +--------+------------------+ | show_id| title | +--------+------------------+ | s1 | The Bear | | s2 | Only Murders | | s3 | The Kardashians | +--------+------------------+
Tasks a) Build a daily impression→click→signup→start_subscription funnel for each day d in [2025-08-26, 2025-09-01]. A user counts in a stage if: click occurs within 1 day of their impression on day d; signup within 3 days of that click; start_subscription within 7 days of signup. Attribute the funnel to the impression day. Output: day, impressions, clicks_w1d, signups_w3d, subs_w7d, and stage-to-stage rates. b) For users who signed up in August 2025, compute per signup_date and acquisition_channel (campaign_id NULL=organic, else paid) the median minutes from signup to first watch_start. c) Flag suspicious campaigns in the last 7 days relative to today (window [2025-08-26, 2025-09-01]) where CTR > 0.80, impressions ≥ 100, and distinct users ≤ 5. Return campaign_id, impressions, clicks, ctr, distinct_users. d) Deduplicate events: for identical (user_id, event_type, show_id) within a 5-second window, keep the earliest event_ts and drop the rest. Produce a de-duplicated events CTE. e) Bonus: For each day in August–September 2025, compute a 28-day rolling unique viewers per show_id using watch_start. On 2025-09-01, return the top 3 shows by that 28-day rolling unique viewers, breaking ties by most recent daily unique viewers. Provide final SQL for each sub-part, and explain index/partition choices and how your solution scales on billions of rows.