Find user pairs with overlapping last K movies
Company: Netflix
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
Quick Answer: This question evaluates algorithm design skills and data-structure use for set-based similarity detection and pairwise comparisons, along with the ability to analyze time and space complexity for scalable solutions.
Constraints
- 1 <= K <= 10^3
- 1 <= M <= K
- Each user's history length is at least K
- Duplicates inside the last K movies count only once because overlap is computed on sets
- All user IDs in one input are of the same comparable type (for example, all strings or all integers)
- Movie IDs are hashable values
Examples
Input: (4, 2, {'A': ['m1', 'm2', 'm3', 'm4', 'm5'], 'B': ['m2', 'm3', 'm6', 'm4'], 'C': ['m7', 'm8', 'm9', 'm10']})
Expected Output: [('A', 'B')]
Explanation: Last-4 sets are A={m2,m3,m4,m5}, B={m2,m3,m4,m6}, C={m7,m8,m9,m10}. Only A and B share at least 2 movies.
Input: (3, 1, {'u1': [1, 2, 2, 3], 'u2': [2, 3, 3, 4], 'u3': [5, 6, 7]})
Expected Output: [('u1', 'u2')]
Explanation: u1's last-3 movies become set {2,3}; u2's become {3,4}; u3's become {5,6,7}. Only u1 and u2 overlap, and duplicates are ignored.
Input: (2, 1, {1: [10, 20], 2: [20, 30], 3: [30, 20], 4: [40, 50]})
Expected Output: [(1, 2), (1, 3), (2, 3)]
Explanation: User 1 shares movie 20 with users 2 and 3, and users 2 and 3 share both 20 and 30. User 4 shares nothing with others.
Input: (2, 2, {'x': [1, 2, 3], 'y': [0, 2, 3], 'z': [2, 4, 3]})
Expected Output: [('x', 'y')]
Explanation: Last-2 sets are x={2,3}, y={2,3}, z={3,4}. Only x and y have overlap size 2, which equals M.
Input: (2, 1, {'p': [1, 2], 'q': [3, 4], 'r': [5, 6]})
Expected Output: []
Explanation: No two users share any movie in their last-2 sets.
Input: (1, 1, {'solo': [42]})
Expected Output: []
Explanation: There is only one user, so no unordered pair can be formed.
Solution
def solution(K, M, histories):
from collections import defaultdict
if len(histories) < 2:
return []
users = sorted(histories.keys())
movie_to_users = defaultdict(list)
for user in users:
recent_movies = set(histories[user][-K:])
for movie in recent_movies:
movie_to_users[movie].append(user)
pair_counts = defaultdict(int)
for users_with_movie in movie_to_users.values():
n = len(users_with_movie)
for i in range(n):
u1 = users_with_movie[i]
for j in range(i + 1, n):
u2 = users_with_movie[j]
pair_counts[(u1, u2)] += 1
result = [pair for pair, count in pair_counts.items() if count >= M]
result.sort()
return resultTime complexity: O(U*K + sum over movies of c_m^2 + R log R), where U is the number of users, c_m is the number of users whose last-K set contains movie m, and R is the number of returned pairs. Space complexity: O(U*K + P), where P is the number of user pairs that share at least one movie.
Hints
- First reduce each user's last K movies to a set so duplicates do not increase the overlap count.
- Instead of comparing every pair of users directly, build an inverted index from movie ID to the users whose last-K set contains that movie.