Tiktok DS Interview Questions
Company: TikTok
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Scenario:
You are provided with two tables: `minute_streamed` and `minute_viewed`. The `minute_streamed` table records each minute of streaming activity, while the `minute_viewed` table records each minute of viewer activity.
`minute_streamed` table structure:
+---------------------+-------------------+----------+--------------------+
| time_minute | streamer_username | category | concurrent_viewers |
+---------------------+-------------------+----------+--------------------+
| 2020-03-19 13:00:00 | aaa | TTBHGD | 133 |
| 2020-03-19 13:01:00 | aaa | TTBHGD | 45 |
| 2020-03-20 21:01:00 | bbb | VGDH | 129 |
| 2020-03-30 22:15:00 | bbb | CCVF | 17 |
+---------------------+-------------------+----------+--------------------+
`minute_viewed` table structure:
+---------------------+-----------------+----------------+-------------------+
| time_minute | viewer_username | viewer_country | streamer_username |
+---------------------+-----------------+----------------+-------------------+
| 2020-03-19 13:00:00 | ccc | US | aaa |
| 2020-03-19 13:01:00 | ccc | US | aaa |
| 2020-03-20 21:01:00 | ddd | JP | aaa |
| 2020-03-30 22:15:00 | ddd | JP | aaa |
+---------------------+-----------------+----------------+-------------------+
Questions:
Q1: Calculate the total streamed hours for each month.
Write a query to sum the streamed minutes for each month and convert it to hours.
Include a solution for when data spans multiple years.
Q2: Determine the total streaming duration of each streamer along with the ratio of a specific category to the total duration.
Describe how to handle queries when the category varies using keywords and case sensitivity.
Q3: Identify streamers who streamed more in a given month compared to the previous month.
Outline how to deal with different years and handle cases where no data is available for a month (e.g., NULL values).
Q4: Compute the average concurrent viewers for each streamer in 2019 and the total view time from US viewers.
Use the provided code for reference and evaluate its correctness:
```sql
SELECT s.streamer_username, AVG(s.concurrent_viewers) AS avg_viewers, SUM(CASE WHEN v.viewer_country = 'US' THEN 1 ELSE 0 END) AS us_viewer_time
FROM minute_streamed s
JOIN minute_viewed v ON s.streamer_username = v.streamer_username
WHERE YEAR(s.time_minute) = '2019'
GROUP BY s.streamer_username
Hints:
Consider how to format dates and handle string operations in SQL.
Use window functions if beneficial for solving part of the problem.
Explore using conditional statements or arithmetic to handle NULL values and different yearly datasets.
Quick Answer: This question evaluates proficiency in SQL-based time-series aggregation, joins between streaming and viewing tables, conditional and null-aware aggregations, string and case-insensitive category filtering, and basic analytical functions used in data science workflows.