Count buggy vs non-buggy by employer
Company: TikTok
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Take-home Project
Count buggy vs non-buggy submissions for each employer_id, including employers with zero submissions. Return employer_id, buggy_count, non_buggy_count, ordered by employer_id. Write a single SQL query using conditional aggregation. Also show how you would adapt it if submission status were a string column instead of boolean. Schema and sample data:
Tables
- employers(employer_id INT PRIMARY KEY, name VARCHAR(100) NOT NULL)
- submissions(submission_id INT PRIMARY KEY, employer_id INT NOT NULL, created_at DATE NOT NULL, is_buggy BOOLEAN NOT NULL)
Sample rows (employers)
employer_id | name
1 | Acme
2 | Globex
3 | Initech
Sample rows (submissions)
submission_id | employer_id | created_at | is_buggy
10 | 1 | 2025-08-20 | TRUE
11 | 1 | 2025-08-21 | FALSE
12 | 1 | 2025-08-22 | TRUE
13 | 2 | 2025-08-23 | FALSE
14 | 2 | 2025-08-24 | FALSE
Expected result
employer_id | buggy_count | non_buggy_count
1 | 2 | 1
2 | 0 | 2
3 | 0 | 0
Sub-questions:
- Provide the LEFT JOIN + GROUP BY solution with SUM(CASE WHEN is_buggy THEN 1 ELSE 0 END) and SUM(CASE WHEN NOT is_buggy THEN 1 ELSE 0 END).
- Show the variant when status is a STRING column status IN ('buggy','non_buggy').
- Explain how your query remains correct if a new employer is added with no submissions.
Quick Answer: This question evaluates proficiency in SQL data manipulation—particularly conditional aggregation and join semantics—for counting categorized records across related tables in a data science context.