Write SQL filtering, grouping, CASE, UNION tasks
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: HR Screen
Use the following schema and sample data to answer all parts. Assume standard ANSI SQL and that amounts are DECIMAL.
Table: orders
+----------+---------+---------+---------+------------+----------+
| order_id | user_id | channel | amount | created_at | status |
+----------+---------+---------+---------+------------+----------+
| 1 | 1 | web | 19.99 | 2025-08-29 | paid |
| 2 | 1 | store | 10.00 | 2025-09-01 | paid |
| 3 | 2 | web | 5.00 | 2025-09-01 | paid |
| 4 | 2 | store | 5.00 | 2025-09-01 | pending |
| 5 | 3 | web | 100.49 | 2025-09-01 | paid |
| 6 | 3 | web | 3.01 | 2025-09-02 | refunded |
+----------+---------+---------+---------+------------+----------+
a) Filter with WHERE: Return order_id for all orders on 2025-09-01 that are paid and have amount >= 10. Only use a row-level filter (WHERE). List the result set.
b) Round down totals: For each user_id, compute the total paid amount across all dates and round down to whole dollars. Return columns (user_id, floor_paid_total) using FLOOR on the aggregated sum, not on individual rows. Explain why FLOOR(SUM(amount)) differs from SUM(FLOOR(amount)).
c) CASE WHEN bucketing: Add a column value_tier per order: amount < 10 -> 'low', 10 <= amount < 100 -> 'mid', amount >= 100 -> 'high'. For orders on 2025-09-01 only, return value_tier and count(*) ordered by tier. Be explicit about inclusive/exclusive bounds.
d) Aggregate with GROUP BY and HAVING: For each user_id, count paid orders with created_at <= '2025-09-01' and return only users with at least 2 such orders. Provide the exact SQL and the resulting rows.
e) UNION vs UNION ALL efficiency and correctness: From the same orders table, construct two queries that list user_id who ordered on the web and user_id who ordered in store (created_at = '2025-09-01' in both subqueries). First combine them with UNION and report the row count; then with UNION ALL and report the row count. Which operator is typically more efficient and why? When would UNION be required despite the performance difference? Identify the double-counting risk if you used UNION ALL to count unique users across channels.
Quick Answer: This question evaluates proficiency with SQL filtering, aggregation, conditional logic, and set operations—specifically skills around WHERE filtering, GROUP BY/HAVING, CASE bucketing, FLOOR on aggregates, and UNION versus UNION ALL—within the Data Manipulation (SQL/Python) domain and emphasizes practical application of ANSI SQL.