Group users by same last K watched movies
Company: Netflix
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
Quick Answer: This question evaluates competency in set operations, grouping via maps/dictionaries, and algorithmic complexity analysis when comparing suffixes of ordered lists.
Constraints
- 0 <= number of users <= 10^5
- 1 <= K <= 10^3
- For every user, len(history) >= K
- user_id and movie_id are hashable values such as strings or integers
- Movie IDs may repeat inside a user's history
Examples
Input: (3, {"A": ["m1", "m2", "m3", "m4", "m5"], "B": ["m7", "m3", "m4", "m5"], "C": ["m2", "m3", "m4", "m6"]})
Expected Output: [['A', 'B']]
Explanation: A and B both have last-3 set {m3, m4, m5}. C has {m3, m4, m6}, so it is not grouped.
Input: (2, {1: [5, 2, 3], 2: [8, 3, 2], 3: [1, 6, 7], 4: [7, 6], 5: [9, 8]})
Expected Output: [[1, 2], [3, 4]]
Explanation: Users 1 and 2 share the set {2, 3}. Users 3 and 4 share the set {6, 7}. User 5 is alone.
Input: (3, {101: [1, 2, 2], 102: [2, 1, 1], 103: [1, 3, 3], 104: [9, 8, 7]})
Expected Output: [[101, 102]]
Explanation: The last 3 movies for 101 and 102 become the same set {1, 2} after ignoring order and duplicates.
Input: (2, {})
Expected Output: []
Explanation: With no users, there are no groups to return.
Input: (1, {"u1": ["x", "y"], "u2": ["z", "y"], "u3": ["y", "x"], "u4": ["x"]})
Expected Output: [['u1', 'u2'], ['u3', 'u4']]
Explanation: For K = 1, users are grouped by their single most recent movie: u1 and u2 watched y last, while u3 and u4 watched x last.
Solution
def solution(K, histories):
groups = {}
for user_id, watched in histories.items():
key = frozenset(watched[-K:])
if key not in groups:
groups[key] = []
groups[key].append(user_id)
result = []
for users in groups.values():
if len(users) >= 2:
users.sort(key=lambda x: str(x))
result.append(users)
result.sort(key=lambda group: tuple(str(x) for x in group))
return resultTime complexity: O(U * K + U log U), where U is the number of users. Building each last-K set takes O(K), and the deterministic output sorting costs up to O(U log U) in total.. Space complexity: O(U * K) in the worst case for storing grouping keys and user buckets..
Hints
- Since order within the last K movies does not matter, think about converting those movies into an order-independent hashable key.
- Use a dictionary to collect all users that share the same last-K movie set, then keep only buckets with at least two users.