Write robust SQL for streaming analytics

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a Medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Twitch.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Twitch during technical interviews.

Question

Write efficient PostgreSQL SQL for the following streaming analytics tasks. Use the invented schema and sample data below. Handle duplicates, out-of-order events, and missing stops as specified.

Schema:
streams(stream_id INT PRIMARY KEY, creator_id INT, game_id INT, started_at TIMESTAMP, ended_at TIMESTAMP, is_partner BOOLEAN)
view_events(user_id INT, stream_id INT, ts TIMESTAMP, action TEXT CHECK (action IN ('start','stop')), device TEXT, country TEXT)

Sample data (streams):
+-----------+------------+---------+---------------------+---------------------+------------+
| stream_id | creator_id | game_id | started_at | ended_at | is_partner |
+-----------+------------+---------+---------------------+---------------------+------------+
| 101 | 1 | 10 | 2025-08-31 23:50:00 | 2025-09-01 02:10:00 | true |
| 102 | 2 | 11 | 2025-09-01 00:05:00 | 2025-09-01 01:00:00 | false |
| 103 | 1 | 12 | 2025-09-01 03:00:00 | 2025-09-01 04:00:00 | true |
+-----------+------------+---------+---------------------+---------------------+------------+

Sample data (view_events):
+---------+-----------+---------------------+--------+--------+---------+
| user_id | stream_id | ts | action | device | country |
+---------+-----------+---------------------+--------+--------+---------+
| 1001 | 101 | 2025-08-31 23:55:00 | start | mobile | US |
| 1001 | 101 | 2025-09-01 00:40:00 | stop | mobile | US |
| 1002 | 101 | 2025-09-01 00:00:00 | start | web | US |
| 1002 | 101 | 2025-09-01 00:20:00 | stop | web | US |
| 1001 | 102 | 2025-09-01 00:10:00 | start | mobile | US |
| 1001 | 102 | 2025-09-01 00:45:00 | stop | mobile | US |
| 1003 | 102 | 2025-09-01 00:20:00 | start | tv | CA |
| 1003 | 102 | 2025-09-01 00:50:00 | stop | tv | CA |
| 1002 | 103 | 2025-09-01 03:05:00 | start | web | US |
| 1002 | 103 | 2025-09-01 03:45:00 | stop | web | US |
+---------+-----------+---------------------+--------+--------+---------+

Assumptions to enforce in your SQL:
- Events can arrive out of order; de-duplicate exact duplicate rows.
- If a 'start' has no subsequent 'stop', treat its stop as LEAST(streams.ended_at, ts + interval '4 hours').
- Ignore any 'stop' that occurs before its last unmatched 'start' for the same user_id, stream_id.

Tasks:
A) Peak concurrency per stream: return stream_id, peak_concurrent_viewers, and the ts window where the peak occurs.
B) Multi-streaming overlaps on 2025-09-01: return user_id and the total minutes they watched two different streams concurrently for >= 5 minutes overlap windows.
C) 7-day new-viewer retention: considering users whose first-ever view occurred between 2025-08-24 and 2025-08-31, compute D+7 retention where a user is retained if they watch any stream for at least 2 minutes on their 7th day after first view. Assume today is 2025-09-01.
D) Top creators by average per-viewer watch time in the last 7 days (2025-08-26 to 2025-09-01): return creator_id and avg_watch_minutes among US viewers, restricted to creators with >= 100 unique US viewers; break ties by higher peak concurrency from Task A.

PracHub · Accepted Answer

This question evaluates a candidate's ability to author robust SQL for streaming analytics, covering de-duplication, event-time handling, sessionization, concurrency, overlap detection, and retention calculations on time-series view event data.

Quick Overview

Quick Overview