Analyze Streaming Data with SQL for Monthly Trends
Company: Twitch
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
minute_streamed
+---------------------+------------------+------------+--------------------+
| time_minute | streamer_username| category | concurrent_viewers |
+---------------------+------------------+------------+--------------------+
| 2020-03-19 13:00:00 | aaa | TTBHGD | 133 |
| 2020-03-19 13:01:00 | aaa | TTBHGD | 45 |
| 2020-03-20 21:01:00 | bbb | VGDH | 129 |
| 2020-03-30 22:15:00 | bbb | CCVF | 17 |
+---------------------+------------------+------------+--------------------+
minute_viewed
+---------------------+------------------+---------------+------------------+
| time_minute | viewer_username | viewer_country| streamer_username|
+---------------------+------------------+---------------+------------------+
| 2020-03-19 13:00:00 | ccc | US | aaa |
| 2020-03-19 13:01:00 | ccc | US | aaa |
| 2020-03-20 21:01:00 | ddd | JP | aaa |
| 2020-03-30 22:15:00 | ddd | JP | aaa |
+---------------------+------------------+---------------+------------------+
##### Scenario
You work for a live-streaming platform that stores one record per streamed minute and one record per viewed minute. Management wants several analytics reports over these tables.
##### Question
From table minute_streamed, compute the total hours streamed for every calendar month; return results ordered chronologically. Explain how your query still works when the data spans multiple years. 2. For each streamer, return
(a) their total streamed hours and
(b) the percentage of those hours that belong to a category containing a given keyword (case-insensitive match). 3. List all streamers whose total streamed hours in any month are greater than the previous month. Handle datasets that contain more than one year and months with no activity. 4. Using minute_streamed and minute_viewed, produce for each streamer
(a) their average concurrent_viewers in 2019 and
(b) the total minutes they were watched by viewers from the US in 2019.
##### Hints
Use EXTRACT or DATE_FORMAT to pull year/month, COUNT(*)/60 to convert minutes to hours, UPPER(...) LIKE, IFNULL for missing months, and self-join or window functions for month-over-month comparisons.
Quick Answer: This question evaluates proficiency in SQL time-series aggregation, date extraction and grouping across calendar months, join and conditional aggregation techniques, windowed month-over-month comparisons, and handling sparse or multi-year datasets; Category: Data Manipulation (SQL/Python); level: practical application.