Write complex SQL for streaming funnels
Company: Disney
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: HR Screen
You are analyzing a Disney/Hulu product funnel with events captured in a single table. Write ANSI-SQL (Snowflake/BigQuery compatible) to answer the sub-questions. Use 'today' = 2025-09-01.
Schema
- users(user_id STRING, created_at TIMESTAMP, country STRING, platform STRING)
- events(event_ts TIMESTAMP, user_id STRING, event_type STRING in ['impression','click','signup','start_subscription','watch_start','watch_end'], show_id STRING NULL, session_id STRING, revenue_cents INT NULL, campaign_id STRING NULL)
- shows(show_id STRING, title STRING)
Small sample tables
users
+---------+---------------------+---------+----------+
| user_id | created_at | country | platform |
+---------+---------------------+---------+----------+
| u1 | 2025-08-10 12:00:00 | US | iOS |
| u2 | 2025-08-25 09:00:00 | US | Web |
| u3 | 2025-08-30 08:30:00 | CA | Android |
| u4 | 2025-08-31 21:00:00 | US | iOS |
+---------+---------------------+---------+----------+
events
+---------------------+--------+------------------+--------+-----------+---------------+-------------+
| event_ts | user_id| event_type | show_id| session_id| revenue_cents | campaign_id |
+---------------------+--------+------------------+--------+-----------+---------------+-------------+
| 2025-08-26 10:01:00 | u1 | impression | s1 | sA | NULL | c10 |
| 2025-08-26 10:02:00 | u1 | click | s1 | sA | NULL | c10 |
| 2025-08-26 10:05:00 | u1 | signup | NULL | sA | NULL | NULL |
| 2025-08-27 11:00:00 | u1 | start_subscription| NULL | sA | 7999 | NULL |
| 2025-08-30 09:00:00 | u2 | impression | s2 | sB | NULL | c10 |
| 2025-08-30 09:01:00 | u2 | click | s2 | sB | NULL | c10 |
| 2025-08-31 12:00:00 | u2 | signup | NULL | sB | NULL | NULL |
| 2025-09-01 08:00:00 | u3 | impression | s1 | sC | NULL | c99 |
| 2025-09-01 08:00:02 | u3 | impression | s1 | sC | NULL | c99 |
| 2025-09-01 08:03:00 | u3 | click | s1 | sC | NULL | c99 |
| 2025-09-01 08:20:00 | u3 | watch_start | s1 | sC | NULL | NULL |
| 2025-09-01 08:50:00 | u3 | watch_end | s1 | sC | NULL | NULL |
| 2025-08-31 20:00:00 | u4 | impression | s3 | sD | NULL | NULL |
| 2025-09-01 07:30:00 | u4 | click | s3 | sD | NULL | NULL |
| 2025-09-01 07:40:00 | u4 | signup | NULL | sD | NULL | NULL |
+---------------------+--------+------------------+--------+-----------+---------------+-------------+
shows
+--------+------------------+
| show_id| title |
+--------+------------------+
| s1 | The Bear |
| s2 | Only Murders |
| s3 | The Kardashians |
+--------+------------------+
Tasks
a) Build a daily impression→click→signup→start_subscription funnel for each day d in [2025-08-26, 2025-09-01]. A user counts in a stage if: click occurs within 1 day of their impression on day d; signup within 3 days of that click; start_subscription within 7 days of signup. Attribute the funnel to the impression day. Output: day, impressions, clicks_w1d, signups_w3d, subs_w7d, and stage-to-stage rates.
b) For users who signed up in August 2025, compute per signup_date and acquisition_channel (campaign_id NULL=organic, else paid) the median minutes from signup to first watch_start.
c) Flag suspicious campaigns in the last 7 days relative to today (window [2025-08-26, 2025-09-01]) where CTR > 0.80, impressions ≥ 100, and distinct users ≤ 5. Return campaign_id, impressions, clicks, ctr, distinct_users.
d) Deduplicate events: for identical (user_id, event_type, show_id) within a 5-second window, keep the earliest event_ts and drop the rest. Produce a de-duplicated events CTE.
e) Bonus: For each day in August–September 2025, compute a 28-day rolling unique viewers per show_id using watch_start. On 2025-09-01, return the top 3 shows by that 28-day rolling unique viewers, breaking ties by most recent daily unique viewers.
Provide final SQL for each sub-part, and explain index/partition choices and how your solution scales on billions of rows.
Quick Answer: This question evaluates a candidate's ability to write complex ANSI-SQL for event-level funnel analysis, including temporal windowing, user attribution, multi-stage aggregation, and handling of event schemas.