PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates proficiency in SQL and pandas data-manipulation techniques, including aggregations with NULL semantics, various join types, set operations and deduplication, window functions, and reconciling normalized and denormalized sources.

  • Medium
  • Fannie Mae
  • Data Manipulation (SQL/Python)
  • Data Scientist

Assess SQL joins, unions, windows, dedup, and pandas

Company: Fannie Mae

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Using the schema and sample data below, answer all sub-questions. Unless stated, assume ANSI SQL and explain any dialect-specific choices. Schema: - customers(customer_id PK, name) - orders(order_id PK, customer_id FK->customers, order_date DATE, amount DECIMAL(10,2)) - order_items(item_id PK, order_id FK->orders, product_id FK->products, qty INT, price DECIMAL(10,2)) - products(product_id PK, name, category) - web_orders(order_id, customer_id, amount) - store_orders(order_id, customer_id, amount) - employees(emp_id PK, name, salary INT) Sample tables (tiny): customers | customer_id | name | |------------:|-------| | 1 | Alice | | 2 | Bob | | 3 | Carol | orders | order_id | customer_id | order_date | amount | |---------:|------------:|-------------|---------| | 101 | 1 | 2025-08-28 | 120.00 | | 102 | 1 | 2025-08-30 | 80.00 | | 103 | 2 | 2025-08-29 | 50.00 | | 105 | 3 | 2025-08-30 | NULL | order_items | item_id | order_id | product_id | qty | price | |--------:|---------:|-----------:|----:|--------| | 1 | 101 | 10 | 1 | 120.00 | | 2 | 102 | 11 | 2 | 40.00 | | 3 | 103 | 12 | 1 | 50.00 | products | product_id | name | category | |-----------:|-------|--------------| | 10 | Phone | Electronics | | 11 | Cable | Electronics | | 12 | Book | Media | web_orders | order_id | customer_id | amount | |---------:|------------:|-------:| | 201 | 1 | 50.00 | | 202 | 1 | 50.00 | | 202 | 1 | 50.00 | store_orders | order_id | customer_id | amount | |---------:|------------:|-------:| | 301 | 1 | 50.00 | | 202 | 1 | 50.00 | | 302 | 2 | 50.00 | employees | emp_id | name | salary | |------:|-------|-------:| | 1 | Alice | 100 | | 2 | Bob | 200 | | 3 | Carol | 300 | | 4 | Dave | 300 | Tasks: 1) Aggregations and NULLs: Write queries that compute per-customer: (a) COUNT(*), COUNT(amount), SUM(amount), AVG(amount), MIN/MAX(order_date). Show the result difference caused by orders.amount being NULL for customer_id=3. Explain COUNT(column) vs COUNT(*), and how AVG ignores NULLs. Also compute total revenue from order_items using SUM(qty*price) and reconcile it to SUM(orders.amount); explain any mismatch and how to detect inconsistencies. 2) Joins: (a) INNER JOIN customers↔orders to list customers who placed orders. (b) LEFT JOIN customers↔orders to include customers with zero orders; return 0 as totals for those. (c) FULL OUTER JOIN customers↔orders and explain which rows appear only on one side. (d) CROSS JOIN products↔(SELECT DISTINCT category) to illustrate Cartesian output and how to limit it. Provide queries and one or two expected result rows to prove understanding. 3) UNION vs UNION ALL: Combine web_orders and store_orders into a single set of (order_id, customer_id, amount). (a) Using UNION ALL, count total rows and the number of duplicate order_ids. (b) Using UNION (distinct), show deduplicated rows. (c) Return only the order_ids that appear in both sources without using INTERSECT. 4) Window functions: (a) For each customer, rank their orders by amount DESC and return the top 1 per customer using ROW_NUMBER. (b) Compute a running total of amount per customer ordered by order_date; show partition boundaries. (c) For August 2025, compute each customer’s rank by monthly spend using SUM(amount) OVER(PARTITION BY customer_id) and then DENSE_RANK across customers. Explain why a window does not collapse rows like GROUP BY. 5) View vs table: Create a view top_spenders_2025_08(customer_id, total_amount) over orders. Is it updatable? Under what conditions? What happens if the base table adds a NOT NULL column or if the view definition references a column later dropped? Discuss pros/cons of views vs materialized tables for this case and how to refresh a materialized view (name your chosen RDBMS). 6) Hide duplicated rows (select-time): Return one row per distinct (order_id, customer_id, amount) from web_orders without deleting data. Show two approaches: DISTINCT and ROW_NUMBER() filtering; discuss performance and which preserves deterministic choice. 7) Remove duplicated rows (data-change): In web_orders, delete true duplicates keeping the lowest ROW_NUMBER by (order_id, customer_id, amount) and explain how you’d do this safely in a transaction with a reproducible tie-breaker. Provide the DELETE…CTE you would run. 8) LC #177 (Nth highest salary): Using employees(name, salary), write a query that returns the Nth highest distinct salary given a parameter :n. Show outputs for n=1,2,3 on the sample data and explain behavior when n exceeds the number of distinct salaries. 9) Improve SQL efficiency: Propose an indexing strategy for the above workloads (consider orders(customer_id, order_date), order_items(order_id), and covering indexes). Show how to rewrite one query to be sargable (e.g., avoid functions on indexed columns), and explain when EXISTS outperforms IN and when a window function can replace a self-join. Describe how you’d verify improvements using an execution plan and timing. 10) Python pandas merge vs join vs concat: Given DataFrames below, specify the exact pandas calls (with how=, on=, axis=, ignore_index=, validate=) to: (a) left-join spend onto cities; (b) inner-join both then right-join to keep ids only in df_b; (c) stack df_a and a new rows frame vertically; (d) align on index instead of column; (e) explain when concat vs merge is appropriate. DataFrames: df_a | id | city | |---:|------| | 1 | SF | | 2 | NYC | | 3 | LA | df_b | id | spend | |---:|------:| | 2 | 80 | | 3 | 20 | | 4 | 10 | df_c | id | tag | |---:|-----| | 3 | vip |

Quick Answer: This question evaluates proficiency in SQL and pandas data-manipulation techniques, including aggregations with NULL semantics, various join types, set operations and deduplication, window functions, and reconciling normalized and denormalized sources.

Last updated: Mar 29, 2026

Related Coding Questions

  • Understand SQL Aggregations and Joins: Key Differences Explained - Fannie Mae (Medium)

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.