PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Coding & Algorithms/Spotify

Join Tables and Rank Users

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in SQL and data-engineering skills—specifically joins, aggregation and ranking via window functions, event deduplication, and query performance considerations across analytics tables.

  • medium
  • Spotify
  • Coding & Algorithms
  • Machine Learning Engineer

Join Tables and Rank Users

Company: Spotify

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are given two analytics tables: - `users(user_id, country, signup_date)` - `listening_events(event_id, user_id, played_at, minutes_played)` Because the data pipeline may retry writes, `listening_events` can contain duplicate rows with the same `event_id`. Write a SQL query to return, for each country, the top 3 users by total `minutes_played` in the last 30 days. For each returned row, include `country`, `user_id`, `total_minutes`, and the user's rank within the country. Then discuss: 1. How you would deduplicate events before aggregation. 2. How you would optimize the query if `listening_events` is extremely large. 3. When partitioning versus sharding would help.

Quick Answer: This question evaluates proficiency in SQL and data-engineering skills—specifically joins, aggregation and ranking via window functions, event deduplication, and query performance considerations across analytics tables.

Spotify logo
Spotify
Mar 4, 2026, 12:00 AM
Machine Learning Engineer
Onsite
Coding & Algorithms
5
0

You are given two analytics tables:

  • users(user_id, country, signup_date)
  • listening_events(event_id, user_id, played_at, minutes_played)

Because the data pipeline may retry writes, listening_events can contain duplicate rows with the same event_id.

Write a SQL query to return, for each country, the top 3 users by total minutes_played in the last 30 days. For each returned row, include country, user_id, total_minutes, and the user's rank within the country.

Then discuss:

  1. How you would deduplicate events before aggregation.
  2. How you would optimize the query if listening_events is extremely large.
  3. When partitioning versus sharding would help.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Spotify•More Machine Learning Engineer•Spotify Machine Learning Engineer•Spotify Coding & Algorithms•Machine Learning Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.