PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates advanced SQL competency for streaming analytics — covering time-series aggregation, monthly bucketing across year boundaries, joins between event tables, window functions, and case-insensitive substring filtering; the domain is Data Manipulation (SQL/Python) and the level is practical application of SQL techniques.

  • Medium
  • Twitch
  • Data Manipulation (SQL/Python)
  • Data Scientist

Solve advanced SQL for streaming analytics

Company: Twitch

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You are given minute-level streaming telemetry. Each row in minute_streamed represents one minute during which a streamer was live; each row in minute_viewed represents one viewer watching a streamer for one minute. Schema - minute_streamed(time_minute TIMESTAMP, streamer_username VARCHAR, category VARCHAR, concurrent_viewers INT) - minute_viewed(time_minute TIMESTAMP, viewer_username VARCHAR, viewer_country VARCHAR, streamer_username VARCHAR) Sample data (small and illustrative) minute_streamed +---------------------+------------------+----------+--------------------+ | time_minute | streamer_username| category | concurrent_viewers | +---------------------+------------------+----------+--------------------+ | 2018-12-31 23:59:00 | aaa | TTBHGD | 100 | | 2019-01-01 00:00:00 | aaa | TTBHGD | 120 | | 2019-01-01 00:01:00 | aaa | TTBHGD | 130 | | 2019-03-15 09:00:00 | bbb | VGDH | 50 | | 2019-03-15 09:01:00 | bbb | VGDH | 60 | | 2019-03-31 22:15:00 | bbb | CCVF | 70 | | 2020-03-19 13:00:00 | aaa | TTBHGD | 133 | | 2020-03-19 13:01:00 | aaa | TTBHGD | 45 | +---------------------+------------------+----------+--------------------+ minute_viewed +---------------------+----------------+----------------+------------------+ | time_minute | viewer_username| viewer_country | streamer_username| +---------------------+----------------+----------------+------------------+ | 2019-01-01 00:00:00 | ccc | US | aaa | | 2019-01-01 00:01:00 | ccc | US | aaa | | 2019-03-15 09:00:00 | ddd | JP | bbb | | 2019-03-15 09:00:00 | eee | US | bbb | | 2019-03-15 09:01:00 | eee | US | bbb | | 2019-03-31 22:15:00 | fff | US | bbb | | 2020-03-19 13:00:00 | ccc | US | aaa | +---------------------+----------------+----------------+------------------+ Assumptions - A streamer may switch categories at any minute. - Months are calendar months based on time_minute (UTC). Label months as YYYY-MM. - 1 row in minute_streamed = 1 minute streamed; hours = minutes/60. - You may use ANSI SQL with CTEs and window functions. Tasks 1) Compute total monthly hours streamed for each month across all streamers. Output (month_yyyy_mm, hours_streamed). Order chronologically. Ensure it works when data spans multiple years (e.g., 2018-12 and 2019-01). 2) For each streamer, return their total streamed minutes and the share accounted for categories whose name contains a given keyword, case-insensitive. Input parameter: cat_keyword VARCHAR. Output (streamer_username, total_minutes, keyword_minutes, keyword_share = keyword_minutes/total_minutes). Treat category matching as case-insensitive substring search. 3) Find, for each streamer and month, whether their streamed hours increased versus the immediately previous calendar month. Treat a missing previous month as 0 minutes (so if a streamer did not stream in February but streamed in March, March counts as an increase over February=0). Handle year boundaries correctly (e.g., compare 2019-01 to 2018-12). Output rows only for months where current_month_minutes > prev_month_minutes: (streamer_username, month_yyyy_mm, current_minutes, prev_minutes). 4) For the year 2019 only, return for each streamer: - avg_concurrent_viewers_2019: the average of concurrent_viewers over that streamer's streamed minutes in 2019 (use minute_streamed only). - us_viewer_minutes_2019: the total number of viewer-minutes from US viewers in 2019. Join minute_viewed to minute_streamed on both (streamer_username, time_minute) so you don’t multiply counts. Output (streamer_username, avg_concurrent_viewers_2019, us_viewer_minutes_2019). As a follow-up, also compute unique_streaming_minutes_with_us_2019: the count of distinct streamed minutes in 2019 where the streamer had at least one US viewer. 5) Conceptual: Briefly explain SQL’s logical query processing order (FROM/JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY) and answer: logically, does AVG() happen before or after JOIN? Deliverables - Provide SQL for tasks 1–4 and a 1–2 sentence explanation for task 5. Ensure your queries are robust to mixed-case categories and months with no activity.

Quick Answer: This question evaluates advanced SQL competency for streaming analytics — covering time-series aggregation, monthly bucketing across year boundaries, joins between event tables, window functions, and case-insensitive substring filtering; the domain is Data Manipulation (SQL/Python) and the level is practical application of SQL techniques.

Last updated: Mar 29, 2026

Related Coding Questions

  • Write robust SQL for streaming analytics - Twitch (Medium)
  • Analyze Streaming Data with SQL for Monthly Trends - Twitch (Medium)

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.