Write one SQL for exam scores aggregation
Company: Capital One
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Take-home Project
You are given an exam database. Write a single SQL statement (CTEs allowed; one final statement only) that satisfies all three requirements below. You must use JOIN, WHERE, GROUP BY, aggregates, and handle NULLs (via ISNULL/COALESCE). Schema and tiny sample data:
Tables
- students(student_id INT, name VARCHAR)
- exams(exam_id INT, subject VARCHAR)
- scores(score_id INT, student_id INT, exam_id INT, attempt_no INT, score INT NULL, taken_at DATE)
Sample rows
students
+------------+-------+
| student_id | name |
+------------+-------+
| 1 | Alice |
| 2 | Bob |
| 3 | Chen |
| 4 | Diego |
+------------+-------+
exams
+---------+---------+
| exam_id | subject |
+---------+---------+
| 10 | Math |
| 20 | English |
+---------+---------+
scores
+----------+------------+---------+------------+-------+------------+
| score_id | student_id | exam_id | attempt_no | score | taken_at |
+----------+------------+---------+------------+-------+------------+
| 100 | 1 | 10 | 1 | 58 | 2025-05-03 |
| 101 | 1 | 10 | 2 | 72 | 2025-05-10 |
| 102 | 2 | 10 | 1 | NULL | 2025-05-07 |
| 103 | 2 | 10 | 2 | 39 | 2025-05-14 |
| 104 | 3 | 10 | 1 | 91 | 2025-05-22 |
| 105 | 4 | 20 | 1 | 84 | 2025-05-05 |
| 106 | 1 | 20 | 1 | 88 | 2025-05-18 |
| 107 | 2 | 20 | 1 | 37 | 2025-05-19 |
| 108 | 3 | 20 | 1 | NULL | 2025-04-29 |
| 109 | 3 | 20 | 2 | 61 | 2025-05-03 |
+----------+------------+---------+------------+-------+------------+
Consider only attempts in May 2025 (2025-05-01 to 2025-05-31 inclusive).
Requirement A (subject-level metrics): Return one row per subject, including subjects with zero May attempts, with these columns:
- subject
- high_scorer_cnt: count of distinct students whose best May score (max per student per subject) is >= 85.
- avg_score_floored: average over all May attempts after transforming score as: t = COALESCE(score, 0); if t < 40 then use 40 else use t.
- low_tag_pct: percentage (0–100, one decimal) of May attempts where COALESCE(score, 0) < 60.
If a subject has zero May attempts, return 0 for the three metrics.
Requirement B (student-by-subject labels): For every (student, subject) pair with at least one May attempt, return:
- subject, student_id, pass_flag where pass_flag = 'Pass' if the student's best May score >= 60 else 'Remedial'.
Requirement C (single statement): Produce a single final result by UNION ALL the two result sets, and add a column level with values 'subject' for Requirement A rows and 'student_subject' for Requirement B rows. For rows where a column is inapplicable, return NULL.
Edge cases to handle: NULL scores, subjects with no May data, students with only NULL or sub-40 attempts, and ensuring distinct students for high_scorer_cnt.
Quick Answer: This question evaluates proficiency in SQL data manipulation and reporting, specifically joins, GROUP BY and aggregate functions, NULL handling (COALESCE/ISNULL), conditional aggregation and set operations to produce combined result sets.