PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates algorithm design skills and data-structure use for set-based similarity detection and pairwise comparisons, along with the ability to analyze time and space complexity for scalable solutions.

  • medium
  • Netflix
  • Coding & Algorithms
  • Software Engineer

Find user pairs with overlapping last K movies

Company: Netflix

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Take-home Project

You are given the same user movie-watching histories as in the previous problem. Each user’s history is an ordered list of movie IDs from earliest to latest. Assume: - `histories` is a dictionary/map from `user_id` to a list of `movie_id`. - Each list has at least `K` movies. **Task** Given integers `K` and `M` (1 ≤ M ≤ K), find all unordered pairs of distinct users `(u1, u2)` such that: - If `L(u)` is the list of the last `K` movies in user `u`'s history, then the number of common movies between `L(u1)` and `L(u2)` (treated as sets or as lists ignoring order and duplicates) is at least `M`. Formally, define: - `S(u) = set of the last K movies in user u's history`. - Return all pairs `(u1, u2)` with `u1 < u2` (or any consistent ordering) such that `|S(u1) ∩ S(u2)| ≥ M`. **Input** - An integer `K` (1 ≤ K ≤ 10^3). - An integer `M` (1 ≤ M ≤ K). - A map/dictionary `histories: user_id → [movie_id1, movie_id2, ..., movie_idN]` where N ≥ K for each user. **Output** - A list of all user pairs `(u1, u2)` that satisfy `|S(u1) ∩ S(u2)| ≥ M`. **Example** Let K = 4, M = 2 and: - user A: [m1, m2, m3, m4, m5] - user B: [m2, m3, m6, m4] - user C: [m7, m8, m9, m10] Then: - S(A) = {m2, m3, m4, m5} - S(B) = {m2, m3, m4, m6} - S(C) = {m7, m8, m9, m10} We have: - |S(A) ∩ S(B)| = 3 ≥ M → (A, B) is a valid pair. - |S(A) ∩ S(C)| = 0 < M → invalid. - |S(B) ∩ S(C)| = 0 < M → invalid. So the output should include the pair (A, B) only. Design an efficient algorithm for large numbers of users and movies, and analyze its time and space complexity.

Quick Answer: This question evaluates algorithm design skills and data-structure use for set-based similarity detection and pairwise comparisons, along with the ability to analyze time and space complexity for scalable solutions.

You are given a dictionary `histories` that maps each `user_id` to an ordered list of `movie_id` values from earliest to latest. For each user `u`, consider only the last `K` movies in their history, then ignore order and duplicates by converting those movies into a set `S(u)`. Return all unordered pairs of distinct users whose last-`K` sets overlap in at least `M` movies. A pair `(u1, u2)` is valid if `u1 < u2` and `|S(u1) ∩ S(u2)| >= M`. For deterministic grading, return the pairs as a sorted list of tuples, with each tuple ordered so the smaller user ID comes first.

Constraints

  • 1 <= K <= 10^3
  • 1 <= M <= K
  • Each user's history length is at least K
  • Duplicates inside the last K movies count only once because overlap is computed on sets
  • All user IDs in one input are of the same comparable type (for example, all strings or all integers)
  • Movie IDs are hashable values

Examples

Input: (4, 2, {'A': ['m1', 'm2', 'm3', 'm4', 'm5'], 'B': ['m2', 'm3', 'm6', 'm4'], 'C': ['m7', 'm8', 'm9', 'm10']})

Expected Output: [('A', 'B')]

Explanation: Last-4 sets are A={m2,m3,m4,m5}, B={m2,m3,m4,m6}, C={m7,m8,m9,m10}. Only A and B share at least 2 movies.

Input: (3, 1, {'u1': [1, 2, 2, 3], 'u2': [2, 3, 3, 4], 'u3': [5, 6, 7]})

Expected Output: [('u1', 'u2')]

Explanation: u1's last-3 movies become set {2,3}; u2's become {3,4}; u3's become {5,6,7}. Only u1 and u2 overlap, and duplicates are ignored.

Input: (2, 1, {1: [10, 20], 2: [20, 30], 3: [30, 20], 4: [40, 50]})

Expected Output: [(1, 2), (1, 3), (2, 3)]

Explanation: User 1 shares movie 20 with users 2 and 3, and users 2 and 3 share both 20 and 30. User 4 shares nothing with others.

Input: (2, 2, {'x': [1, 2, 3], 'y': [0, 2, 3], 'z': [2, 4, 3]})

Expected Output: [('x', 'y')]

Explanation: Last-2 sets are x={2,3}, y={2,3}, z={3,4}. Only x and y have overlap size 2, which equals M.

Input: (2, 1, {'p': [1, 2], 'q': [3, 4], 'r': [5, 6]})

Expected Output: []

Explanation: No two users share any movie in their last-2 sets.

Input: (1, 1, {'solo': [42]})

Expected Output: []

Explanation: There is only one user, so no unordered pair can be formed.

Solution

def solution(K, M, histories):
    from collections import defaultdict

    if len(histories) < 2:
        return []

    users = sorted(histories.keys())
    movie_to_users = defaultdict(list)

    for user in users:
        recent_movies = set(histories[user][-K:])
        for movie in recent_movies:
            movie_to_users[movie].append(user)

    pair_counts = defaultdict(int)

    for users_with_movie in movie_to_users.values():
        n = len(users_with_movie)
        for i in range(n):
            u1 = users_with_movie[i]
            for j in range(i + 1, n):
                u2 = users_with_movie[j]
                pair_counts[(u1, u2)] += 1

    result = [pair for pair, count in pair_counts.items() if count >= M]
    result.sort()
    return result

Time complexity: O(U*K + sum over movies of c_m^2 + R log R), where U is the number of users, c_m is the number of users whose last-K set contains movie m, and R is the number of returned pairs. Space complexity: O(U*K + P), where P is the number of user pairs that share at least one movie.

Hints

  1. First reduce each user's last K movies to a set so duplicates do not increase the overlap count.
  2. Instead of comparing every pair of users directly, build an inverted index from movie ID to the users whose last-K set contains that movie.
Last updated: Apr 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Compute Minimum Task Completion Time - Netflix (medium)
  • Solve String Arrays and Row Deduplication - Netflix (medium)
  • Implement Cache, Undo, and DFS - Netflix
  • Implement Streaming Word Counter - Netflix (medium)
  • Implement TTL Cache and Tree Balance Reporting - Netflix (medium)