Group users by same last K watched movies
Company: Netflix
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
You are given a list of users' movie-watching histories. Each user’s history is an ordered list of movie IDs in the order they were watched (earliest to latest).
Assume:
- `histories` is a dictionary/map from `user_id` (string or integer) to a list of `movie_id` (string or integer).
- Each list has at least `K` movies.
**Task**
Write an algorithm to group users such that, within each group, all users share the exact same set of the last `K` movies they watched (ignoring order within those last `K` movies). Two users belong to the same group if the following holds:
- Let `S(u)` be the set of the last `K` movies in user `u`'s history.
- Users `u1` and `u2` are in the same group if `S(u1) = S(u2)`.
**Input**
- An integer `K` (1 ≤ K ≤ 10^3).
- A map/dictionary `histories: user_id → [movie_id1, movie_id2, ..., movie_idN]` where N ≥ K for each user.
**Output**
- Return a list of groups, where each group is a list of `user_id`s. Only groups of size ≥ 2 (i.e., at least two users with the same last-K set) need to be returned.
**Example**
Suppose K = 3 and:
- user A: [m1, m2, m3, m4, m5]
- user B: [m7, m3, m4, m5]
- user C: [m2, m3, m4, m6]
Then:
- Last 3 for A: {m3, m4, m5}
- Last 3 for B: {m3, m4, m5}
- Last 3 for C: {m3, m4, m6}
The output should group A and B together, and C can be omitted or appear alone depending on your choice, but only multi-user groups are required in the output.
Describe the algorithm and its time and space complexity.
Quick Answer: This question evaluates competency in set operations, grouping via maps/dictionaries, and algorithmic complexity analysis when comparing suffixes of ordered lists.
You are given a dictionary `histories` mapping each `user_id` to an ordered list of watched `movie_id`s from earliest to latest. For each user, look only at their last `K` watched movies and convert those `K` movies into a set, so order does not matter and duplicates are ignored. Group users together if these last-`K` sets are exactly equal. Return only groups with at least 2 users. If no such groups exist, return an empty list.
For deterministic output, sort each group by the string representation of its user IDs, and sort the final list of groups lexicographically by those sorted groups.
Constraints
- 0 <= number of users <= 10^5
- 1 <= K <= 10^3
- For every user, len(history) >= K
- user_id and movie_id are hashable values such as strings or integers
- Movie IDs may repeat inside a user's history