Debug SQL join that drops rows
Company: Homedepot
Role: Product Analyst
Category: Data Manipulation (SQL/Python)
Difficulty: easy
Interview Round: Onsite
You’re analyzing Home Improvement retail transactions to understand sales of **Mulch** during promotions. After joining multiple tables, your final result has **far fewer rows** than expected.
## Tables
Assume the following schemas (all dates are in the same timezone; `DATE` has no time component):
### `sales`
- `sale_id` (BIGINT, PK)
- `sale_date` (DATE)
- `store_id` (INT)
- `customer_id` (BIGINT)
### `sale_items`
- `sale_id` (BIGINT, FK → `sales.sale_id`)
- `line_id` (INT) — unique within a sale
- `product_id` (BIGINT)
- `qty` (INT)
- `unit_price` (DECIMAL(10,2))
### `products`
- `product_id` (BIGINT, PK)
- `category` (VARCHAR)
- `sub_category` (VARCHAR)
### `promotions`
- `promo_id` (BIGINT, PK)
- `product_id` (BIGINT)
- `start_date` (DATE)
- `end_date` (DATE)
- `discount_pct` (DECIMAL(5,2))
## Intended output
One row per day for the last 30 days:
- `sale_date`
- `mulch_units` = total units of mulch sold (promo + non-promo)
- `mulch_revenue` = total revenue from mulch
- `promo_mulch_units` = mulch units sold while an applicable promo was active
## Given (buggy) query
```sql
SELECT
s.sale_date,
SUM(i.qty) AS mulch_units,
SUM(i.qty * i.unit_price) AS mulch_revenue,
SUM(CASE WHEN p.promo_id IS NOT NULL THEN i.qty ELSE 0 END) AS promo_mulch_units
FROM sales s
JOIN sale_items i
ON s.sale_id = i.sale_id
JOIN products pr
ON pr.product_id = i.product_id
LEFT JOIN promotions p
ON p.product_id = i.product_id
AND s.sale_date BETWEEN p.start_date AND p.end_date
WHERE pr.category = 'Mulch'
AND p.discount_pct > 0
AND s.sale_date >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY 1;
```
## Task
1. Explain **why** this query can return **too few rows / too few units** compared to expectations.
2. Describe a **step-by-step debugging approach** to isolate which join/filter is dropping rows (e.g., incremental counts after each join).
3. Provide a corrected SQL query that produces the **intended output** (including non-promo mulch sales).
Quick Answer: This question evaluates understanding of SQL joins, filtering interactions, aggregation, and troubleshooting data discrepancies when combining sales, sale_items, products, and promotions tables.