How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Take-home Project rounds at Netflix.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Netflix during technical interviews.

Find user pairs with overlapping last K movies

Q: Find user pairs with overlapping last K movies

This question evaluates algorithm design skills and data-structure use for set-based similarity detection and pairwise comparisons, along with the ability to analyze time and space complexity for scalable solutions.

Company: Netflix

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Take-home Project

You are given the same user movie-watching histories as in the previous problem. Each user’s history is an ordered list of movie IDs from earliest to latest. Assume: - `histories` is a dictionary/map from `user_id` to a list of `movie_id`. - Each list has at least `K` movies. **Task** Given integers `K` and `M` (1 ≤ M ≤ K), find all unordered pairs of distinct users `(u1, u2)` such that: - If `L(u)` is the list of the last `K` movies in user `u`'s history, then the number of common movies between `L(u1)` and `L(u2)` (treated as sets or as lists ignoring order and duplicates) is at least `M`. Formally, define: - `S(u) = set of the last K movies in user u's history`. - Return all pairs `(u1, u2)` with `u1 < u2` (or any consistent ordering) such that `|S(u1) ∩ S(u2)| ≥ M`. **Input** - An integer `K` (1 ≤ K ≤ 10^3). - An integer `M` (1 ≤ M ≤ K). - A map/dictionary `histories: user_id → [movie_id1, movie_id2, ..., movie_idN]` where N ≥ K for each user. **Output** - A list of all user pairs `(u1, u2)` that satisfy `|S(u1) ∩ S(u2)| ≥ M`. **Example** Let K = 4, M = 2 and: - user A: [m1, m2, m3, m4, m5] - user B: [m2, m3, m6, m4] - user C: [m7, m8, m9, m10] Then: - S(A) = {m2, m3, m4, m5} - S(B) = {m2, m3, m4, m6} - S(C) = {m7, m8, m9, m10} We have: - |S(A) ∩ S(B)| = 3 ≥ M → (A, B) is a valid pair. - |S(A) ∩ S(C)| = 0 < M → invalid. - |S(B) ∩ S(C)| = 0 < M → invalid. So the output should include the pair (A, B) only. Design an efficient algorithm for large numbers of users and movies, and analyze its time and space complexity.

Quick Answer: This question evaluates algorithm design skills and data-structure use for set-based similarity detection and pairwise comparisons, along with the ability to analyze time and space complexity for scalable solutions.

You are given a dictionary `histories` that maps each `user_id` to an ordered list of `movie_id` values from earliest to latest. For each user `u`, consider only the last `K` movies in their history, then ignore order and duplicates by converting those movies into a set `S(u)`. Return all unordered pairs of distinct users whose last-`K` sets overlap in at least `M` movies. A pair `(u1, u2)` is valid if `u1 < u2` and `|S(u1) ∩ S(u2)| >= M`. For deterministic grading, return the pairs as a sorted list of tuples, with each tuple ordered so the smaller user ID comes first.

Constraints

1 <= K <= 10^3
1 <= M <= K
Each user's history length is at least K
Duplicates inside the last K movies count only once because overlap is computed on sets
All user IDs in one input are of the same comparable type (for example, all strings or all integers)
Movie IDs are hashable values

Examples

Input: (4, 2, {'A': ['m1', 'm2', 'm3', 'm4', 'm5'], 'B': ['m2', 'm3', 'm6', 'm4'], 'C': ['m7', 'm8', 'm9', 'm10']})

Expected Output: [('A', 'B')]

Explanation: Last-4 sets are A={m2,m3,m4,m5}, B={m2,m3,m4,m6}, C={m7,m8,m9,m10}. Only A and B share at least 2 movies.

Input: (3, 1, {'u1': [1, 2, 2, 3], 'u2': [2, 3, 3, 4], 'u3': [5, 6, 7]})

Expected Output: [('u1', 'u2')]

Explanation: u1's last-3 movies become set {2,3}; u2's become {3,4}; u3's become {5,6,7}. Only u1 and u2 overlap, and duplicates are ignored.

Input: (2, 1, {1: [10, 20], 2: [20, 30], 3: [30, 20], 4: [40, 50]})

Expected Output: [(1, 2), (1, 3), (2, 3)]

Explanation: User 1 shares movie 20 with users 2 and 3, and users 2 and 3 share both 20 and 30. User 4 shares nothing with others.

Input: (2, 2, {'x': [1, 2, 3], 'y': [0, 2, 3], 'z': [2, 4, 3]})

Expected Output: [('x', 'y')]

Explanation: Last-2 sets are x={2,3}, y={2,3}, z={3,4}. Only x and y have overlap size 2, which equals M.

Input: (2, 1, {'p': [1, 2], 'q': [3, 4], 'r': [5, 6]})

Expected Output: []

Explanation: No two users share any movie in their last-2 sets.

Input: (1, 1, {'solo': [42]})

Expected Output: []

Explanation: There is only one user, so no unordered pair can be formed.

Solution

def solution(K, M, histories):
    from collections import defaultdict

    if len(histories) < 2:
        return []

    users = sorted(histories.keys())
    movie_to_users = defaultdict(list)

    for user in users:
        recent_movies = set(histories[user][-K:])
        for movie in recent_movies:
            movie_to_users[movie].append(user)

    pair_counts = defaultdict(int)

    for users_with_movie in movie_to_users.values():
        n = len(users_with_movie)
        for i in range(n):
            u1 = users_with_movie[i]
            for j in range(i + 1, n):
                u2 = users_with_movie[j]
                pair_counts[(u1, u2)] += 1

    result = [pair for pair, count in pair_counts.items() if count >= M]
    result.sort()
    return result

Time complexity: O(U*K + sum over movies of c_m^2 + R log R), where U is the number of users, c_m is the number of users whose last-K set contains movie m, and R is the number of returned pairs. Space complexity: O(U*K + P), where P is the number of user pairs that share at least one movie.

Hints

First reduce each user's last K movies to a set so duplicates do not increase the overlap count.
Instead of comparing every pair of users directly, build an inverted index from movie ID to the users whose last-K set contains that movie.

Quick Overview

This question evaluates algorithm design skills and data-structure use for set-based similarity detection and pairwise comparisons, along with the ability to analyze time and space complexity for scalable solutions.