Understand SQL Aggregations and Joins: Key Differences Explained
Company: Fannie Mae
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Employees
+----+--------+--------+---------+
| id | name | salary | dept_id |
+----+--------+--------+---------+
| 1 | Alice | 70000 | 1 |
| 2 | Bob | 90000 | 2 |
| 3 | Carol | 80000 | 1 |
+----+--------+--------+---------+
Departments
+----+------------+
| id | dept_name |
+----+------------+
| 1 | Finance |
| 2 | Operations |
| 3 | Marketing |
+----+------------+
##### Scenario
Querying and manipulating enterprise data stored in relational databases and pandas data frames.
##### Question
Explain the purpose and differences of COUNT, SUM, AVG, MIN, and MAX aggregation functions in SQL. Describe INNER, LEFT, RIGHT, and FULL OUTER joins and give a use-case for each. Contrast UNION and UNION ALL. When is each preferable? What is a window function in SQL and how does it differ from a standard aggregation? Compare SQL views to physical tables. List advantages and disadvantages. How can you hide duplicate rows in a result set without deleting them? How do you permanently remove duplicate rows from a table? Write an SQL query to solve LeetCode #177 (Nth Highest Salary). List practical techniques to improve SQL query efficiency. In pandas, compare merge, join, and concat. When should each be used?
##### Hints
Expect to write efficient SQL, reason about duplicates, and pick the right pandas operation.
Quick Answer: This question evaluates a candidate's competency in SQL aggregations (COUNT, SUM, AVG, MIN, MAX), join semantics (INNER, LEFT, RIGHT, FULL OUTER), set operations (UNION vs UNION ALL), window functions, view versus table trade-offs, duplicate detection and removal, query optimization techniques, and pandas DataFrame operations (merge, join, concat) within the Data Manipulation (SQL/Python) domain. It is commonly asked because it assesses both conceptual understanding (function purposes and join semantics) and practical application (writing efficient queries, deduplicating data, and translating patterns between SQL and pandas), thereby probing correctness and performance reasoning.