This question evaluates proficiency in SQL and data-engineering skills—specifically joins, aggregation and ranking via window functions, event deduplication, and query performance considerations across analytics tables.
You are given two analytics tables:
users(user_id, country, signup_date)
listening_events(event_id, user_id, played_at, minutes_played)
Because the data pipeline may retry writes, listening_events can contain duplicate rows with the same event_id.
Write a SQL query to return, for each country, the top 3 users by total minutes_played in the last 30 days. For each returned row, include country, user_id, total_minutes, and the user's rank within the country.
Then discuss:
listening_events
is extremely large.