Diagnose MySQL joins and GROUP BY/HAVING errors
Company: Amazon
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Take-home Project
You are using MySQL 8.0 with ONLY_FULL_GROUP_BY enabled. Answer all parts precisely. Provide the exact SQL you would run and the final result shapes/values where requested.
Schema (use these small samples exactly as given):
Table A
+----+------+
| id | valA |
+----+------+
| 1 | 'x' |
| 2 | 'y' |
| 3 | 'z' |
| 4 | NULL|
+----+------+
Table B
+----+------+
| id | valB |
+----+------+
| 2 | 'p' |
| 3 | 'q' |
| 3 | 'r' |
| 5 | 's' |
|NULL| 't' |
+----+------+
Table t
+------+------+
| a | b |
+------+------+
| 1 | 'u' |
| 1 | 'v' |
| 1 | 'v' |
| 2 | 'w' |
| NULL | 'x' |
+------+------+
Table users
+---------+---------+
| user_id | name |
+---------+---------+
| 1 | 'Ann' |
| 2 | 'Bob' |
| 3 | 'Cara' |
+---------+---------+
Table orders
+----------+---------+---------------------+--------------+
| order_id | user_id | created_at | amount_cents |
+----------+---------+---------------------+--------------+
| 10 | 1 | '2025-08-30 10:00' | 1200 |
| 11 | 1 | '2025-08-31 09:30' | 300 |
| 12 | 2 | '2025-09-01 12:45' | 4500 |
+----------+---------+---------------------+--------------+
Table events
+-----------+------------------+
| event_id | duration_seconds |
+-----------+------------------+
| 100 | 59 |
| 101 | 61 |
| 102 | 90061 |
+-----------+------------------+
Table sessions
+------------+---------+---------------------+---------------------+
| session_id | user_id | login_at | logout_at |
+------------+---------+---------------------+---------------------+
| 1000 | 1 | '2025-08-31 23:50' | '2025-09-01 00:10:40'|
| 1001 | 2 | '2025-09-01 09:00' | '2025-09-01 09:00' |
| 1002 | 3 | '2025-09-01 10:15' | NULL |
+------------+---------+---------------------+---------------------+
A) Explain precisely what LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN mean in terms of set semantics and duplicate handling. Then, using A and B:
1) Write queries for each join type (note: MySQL lacks FULL OUTER JOIN—emulate it with correct SQL).
2) For each join type, give the exact row count returned and list the rows for the first three results when ordered by COALESCE(A.id, B.id), B.valB, A.valA. Explain how NULLs affect equality joins here.
B) For the following MySQL queries over table t, indicate for each whether it executes or errors under ONLY_FULL_GROUP_BY. If it errors, state the exact reason (e.g., nonaggregated column not in GROUP BY, clause order, reserved word usage). Do not change the text.
Q1: SELECT a, COUNT(*) FROM t GROUP BY a HAVING COUNT(*) > 2;
Q2: SELECT a, b, COUNT(*) FROM t GROUP BY a HAVING COUNT(*) > 2;
Q3: SELECT a, COUNT(*) AS cnt FROM t HAVING cnt > 2 GROUP BY a ORDER BY 1,2;
Q4: SELECT a, COUNT(*) AS table FROM t GROUP BY a HAVING table > 2 ORDER BY 1,2;
Q5: SELECT a, COUNT(*) AS `table` FROM t GROUP BY a HAVING `table` > 2 ORDER BY 1,2;
C) Write a single query that returns, for every user (including those with zero orders): user_id, name, orders_count, total_spend_dollars (amount_cents/100 with two decimals). Sort by total_spend_dollars DESC then user_id ASC. Ensure users with no orders appear with 0 and 0.00.
D) Convert duration_seconds in events to a human-readable string exactly in the form 'X days Y hours Z minutes W seconds' using integer arithmetic (no loops/UDFs). For example, 90061 should render '1 days 1 hours 1 minutes 1 seconds'. Provide the SELECT that produces both the original seconds and the string.
E) Using sessions, output session_id, user_id, hhmmss_duration (as TIME) and duration_seconds (as INT). Use TIMEDIFF and TIME_TO_SEC. Treat NULL logout_at as exclude-from-output (i.e., only return rows with non-NULL logout_at). Also explain how your query would change if you had to treat NULL logout_at as NOW().
Quick Answer: This question evaluates mastery of SQL join semantics (LEFT, RIGHT, FULL OUTER emulation, CROSS), NULL equality and duplicate handling, GROUP BY/HAVING behavior under ONLY_FULL_GROUP_BY, aggregation result shaping and type/format conversions in the Data Manipulation (SQL/Python) domain.