Compare WHERE vs HAVING with aggregates
Company: PayPal
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Filter groups based on an aggregate and explain WHERE vs HAVING. Provide a query that returns merchants with chargeback_rate > 0.5% in the last 30 days using HAVING. Clarify why WHERE cannot reference aggregates, how HAVING works after GROUP BY, dialect differences (e.g., MySQL allowing expressions in HAVING without GROUP BY), and performance considerations.
Quick Answer: Compare WHERE vs HAVING with aggregates evaluates SQL or pandas logic, joins, grouping, window functions, null handling, edge cases, and validation in a realistic interview setting. A strong answer states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.
Solution
# Solution Alignment
`WHERE` filters rows before grouping; `HAVING` filters groups after aggregate values are computed.
```sql
SELECT
merchant_id,
COUNT(*) AS total_transactions,
SUM(CASE WHEN is_chargeback THEN 1 ELSE 0 END) AS chargebacks,
SUM(CASE WHEN is_chargeback THEN 1 ELSE 0 END)::numeric / NULLIF(COUNT(*), 0) AS chargeback_rate
FROM transactions
WHERE transaction_time >= CURRENT_DATE - INTERVAL '30 day'
GROUP BY merchant_id
HAVING SUM(CASE WHEN is_chargeback THEN 1 ELSE 0 END)::numeric / NULLIF(COUNT(*), 0) > 0.005
ORDER BY chargeback_rate DESC;
```
The time filter belongs in `WHERE` because it applies to raw rows. The chargeback-rate filter belongs in `HAVING` because it depends on grouped aggregates. Some dialects allow aliases in `HAVING`; repeating the expression is more portable.