You’re analyzing Home Improvement retail transactions to understand sales of Mulch during promotions. After joining multiple tables, your final result has far fewer rows than expected.
Tables
Assume the following schemas (all dates are in the same timezone; DATE has no time component):
sales
-
sale_id
(BIGINT, PK)
-
sale_date
(DATE)
-
store_id
(INT)
-
customer_id
(BIGINT)
sale_items
-
sale_id
(BIGINT, FK →
sales.sale_id
)
-
line_id
(INT) — unique within a sale
-
product_id
(BIGINT)
-
qty
(INT)
-
unit_price
(DECIMAL(10,2))
products
-
product_id
(BIGINT, PK)
-
category
(VARCHAR)
-
sub_category
(VARCHAR)
promotions
-
promo_id
(BIGINT, PK)
-
product_id
(BIGINT)
-
start_date
(DATE)
-
end_date
(DATE)
-
discount_pct
(DECIMAL(5,2))
Intended output
One row per day for the last 30 days:
-
sale_date
-
mulch_units
= total units of mulch sold (promo + non-promo)
-
mulch_revenue
= total revenue from mulch
-
promo_mulch_units
= mulch units sold while an applicable promo was active
Given (buggy) query
SELECT
s.sale_date,
SUM(i.qty) AS mulch_units,
SUM(i.qty * i.unit_price) AS mulch_revenue,
SUM(CASE WHEN p.promo_id IS NOT NULL THEN i.qty ELSE 0 END) AS promo_mulch_units
FROM sales s
JOIN sale_items i
ON s.sale_id = i.sale_id
JOIN products pr
ON pr.product_id = i.product_id
LEFT JOIN promotions p
ON p.product_id = i.product_id
AND s.sale_date BETWEEN p.start_date AND p.end_date
WHERE pr.category = 'Mulch'
AND p.discount_pct > 0
AND s.sale_date >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY 1;
Task
-
Explain
why
this query can return
too few rows / too few units
compared to expectations.
-
Describe a
step-by-step debugging approach
to isolate which join/filter is dropping rows (e.g., incremental counts after each join).
-
Provide a corrected SQL query that produces the
intended output
(including non-promo mulch sales).