You’re given two tables.
users(id INT, country STRING, created_at DATE)
rows:
1 | US | 2025-08-28
2 | US | 2025-08-30
3 | IN | 2025-08-29
4 | BR | 2025-08-31
orders(order_id INT, user_id INT, amount DECIMAL, created_at DATE)
rows:
101 | 1 | 20.00 | 2025-08-31
102 | 1 | 10.00 | 2025-09-01
103 | 2 | NULL | 2025-09-01
104 | 3 | 15.00 | 2025-09-01
events(user_id INT, event_name STRING, event_time TIMESTAMP)
rows:
1 | video_play | 2025-09-01 10:01:00
1 | like | 2025-09-01 10:02:00
2 | video_play | 2025-09-01 11:00:00
3 | video_play | 2025-09-01 12:00:00
3 | video_play | 2025-09-01 12:05:00
4 | video_play | 2025-09-01 13:00:00
Answer all parts concisely and provide the exact SQL:
(a) Explain and demonstrate, using a single grouped query by users.country, the differences among COUNT(*), COUNT(amount), and COUNT(DISTINCT user_id) on orders joined to users. Why do NULLs matter? Show the three counts side-by-side and ensure the join does not double-count users with multiple orders.
(b) Compute, per country, the top 2 spenders by total order amount using ROW_NUMBER(). Then repeat using RANK() so that ties are included even if more than 2 users are returned. For country = 'US', list which user_ids appear under each method and why.
(c) Show a pitfall where filtering in WHERE vs HAVING changes results: return users with at least 2 video_play events on 2025-09-01 who have zero non-NULL orders. First, incorrectly filter orders.amount IS NULL in WHERE before aggregation; then correct it using HAVING. Explain the difference in row counts.
(d) Compute average spend per active user (active = at least one event of any type on 2025-09-01) by country, guarding against divide-by-zero and NULL amounts. Use COALESCE and NULLIF appropriately and justify your choices.