Write SQL to verify quarterly sales decline
Company: Fetch Rewards
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given a single dataset for a beverage company and must decide whether Q2 2024 truly declined. Write one standard SQL query that produces a comparison table with overall and same-store metrics, guarding against grain mistakes and misleading averages.
Schema:
- transactions(transaction_id BIGINT, sale_date DATE, store_id INT, product_id INT, units INT, revenue DECIMAL(10,2), is_return BOOLEAN)
- stores(store_id INT, region VARCHAR, opened_date DATE, closed_date DATE NULL)
- products(product_id INT, category VARCHAR)
Sample data (minimal, illustrative):
transactions
+---------------+------------+----------+------------+-------+---------+-----------+
| transaction_id| sale_date | store_id | product_id | units | revenue | is_return |
+---------------+------------+----------+------------+-------+---------+-----------+
| 1 | 2024-03-30 | 101 | 11 | 10 | 50.00 | false |
| 2 | 2024-04-02 | 101 | 11 | 8 | 44.00 | false |
| 3 | 2024-04-15 | 102 | 12 | 5 | 40.00 | false |
| 4 | 2024-05-01 | 103 | 11 | -1 | -5.50 | true |
| 5 | 2024-06-20 | 101 | 12 | 7 | 63.00 | false |
| 6 | 2023-06-10 | 101 | 11 | 9 | 45.00 | false |
+---------------+------------+----------+------------+-------+---------+-----------+
stores
+---------+--------+-------------+-------------+
| store_id| region | opened_date | closed_date |
+---------+--------+-------------+-------------+
| 101 | West | 2020-01-01 | NULL |
| 102 | East | 2021-05-10 | NULL |
| 103 | North | 2024-05-01 | 2024-07-15 |
+---------+--------+-------------+-------------+
products
+------------+----------+
| product_id | category |
+------------+----------+
| 11 | Soda |
| 12 | Juice |
+------------+----------+
Requirements:
1) Define quarters as: Q1 2024 = 2024-01-01..2024-03-31; Q2 2024 = 2024-04-01..2024-06-30; Q2 2023 = 2023-04-01..2023-06-30.
2) Produce two comparison rows: 'Q2_2024_vs_Q1_2024' and 'Q2_2024_vs_Q2_2023'.
3) For each comparison, output: total_revenue_period_a, total_revenue_period_b, revenue_pct_change, total_units_period_a, total_units_period_b, units_pct_change, avg_price_period_a (sum(revenue)/nullif(sum(units),0)), avg_price_period_b, same_store_revenue_pct_change (restrict stores open for the entirety of both periods: opened_date <= period_start AND (closed_date IS NULL OR closed_date >= period_end)), active_store_count_a, active_store_count_b, decline_flag (true if revenue_pct_change < -5% AND units_pct_change < -5%).
4) Treat returns as negative amounts (no special filtering). Avoid averaging daily averages; compute period-level sums first, then ratios. Ensure correct grain (day x store x product) so that joins do not duplicate rows.
5) The query should be ANSI SQL and run without UDFs; use CTEs and window functions if helpful.
Quick Answer: This question evaluates proficiency in SQL aggregation, time-series period-over-period comparison, same-store cohort analysis, correct grain handling, and treatment of returns, testing data manipulation and analytical SQL skills in the Data Manipulation (SQL/Python) domain.