PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates understanding of graph traversal and shortest-path distance concepts along with multi-criteria ranking (distance, rating, ID) and efficient exploration of large undirected graphs to avoid revisiting nodes.

  • medium
  • Google
  • Coding & Algorithms
  • Software Engineer

Recommend top-K movies from similarity graph

Company: Google

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Movie Recommendation: Top K You are building a simple movie recommendation feature. ### Input - A set of movies `0..(M-1)`. - An **undirected** similarity graph `adj`, where `adj[i]` lists movies similar to movie `i`. - An array `rating[i]` (or `score[i]`) for each movie. - A starting movie `start` that the user liked. - A set `watched` of movies the user has already watched. - An integer `K`. ### Output Return a list of up to `K` **recommended movie IDs** that the user has **not watched**. ### Ranking rules 1. Prefer movies with **smaller graph distance** (fewer similarity hops) from `start`. 2. If distances tie, prefer **higher `rating`**. 3. If still tied, break ties by **smaller movie ID**. ### Notes / constraints - The graph can be large; avoid revisiting nodes. - If fewer than `K` unwatched movies are reachable, return all reachable unwatched movies. ### Follow-up (production thinking) What potential issues might arise when deploying this in production (e.g., caching, concurrency, hot keys, stale data), and how would you mitigate them?

Quick Answer: This question evaluates understanding of graph traversal and shortest-path distance concepts along with multi-criteria ranking (distance, rating, ID) and efficient exploration of large undirected graphs to avoid revisiting nodes.

Part 1: Recommend top-K movies from similarity graph

You are given an undirected similarity graph of movies. Movie i is connected to every movie listed in adj[i]. Each movie also has a rating. Starting from a movie start that the user liked, return up to k recommended movie IDs. A movie is eligible only if: - it is reachable from start, and - it is not in watched, and - it is not the start movie itself. Rank eligible movies by these rules: 1. Smaller graph distance from start comes first. 2. If distances tie, higher rating comes first. 3. If both distance and rating tie, smaller movie ID comes first. If fewer than k eligible movies are reachable, return all of them in the correct order.

Constraints

  • 1 <= len(adj) == len(rating) <= 100000
  • 0 <= start < len(adj)
  • 0 <= k <= len(adj)
  • Each neighbor in adj[i] is a valid movie ID in the range [0, len(adj) - 1]
  • The graph is undirected, but the input may still contain repeated edges; do not revisit nodes infinitely
  • watched may or may not contain start, but start must never be recommended

Examples

Input: ([[1, 2], [0, 3], [0, 3, 4], [1, 2], [2]], [5, 7, 6, 9, 8], 0, [0, 2], 3)

Expected Output: [1, 3, 4]

Explanation: Movie 1 is distance 1 and unwatched. Movies 3 and 4 are distance 2. Among distance-2 movies, rating 9 beats rating 8.

Input: ([[1, 2], [0, 3], [0, 4], [1], [2]], [1, 8, 8, 7, 9], 0, [0], 4)

Expected Output: [1, 2, 4, 3]

Explanation: Movies 1 and 2 are both distance 1 with equal rating, so smaller ID 1 comes first. At distance 2, movie 4 has higher rating than movie 3.

Input: ([[1], [0], []], [3, 4, 5], 0, [0], 5)

Expected Output: [1]

Explanation: Only movie 1 is reachable and unwatched. Movie 2 is disconnected.

Input: ([[]], [10], 0, [], 1)

Expected Output: []

Explanation: Edge case: there is only the start movie, and the start movie itself cannot be recommended.

Solution

def solution(adj, rating, start, watched, k):
    from collections import deque

    n = len(adj)
    if k <= 0 or start < 0 or start >= n:
        return []

    watched_set = set(watched)
    visited = [False] * n
    visited[start] = True
    q = deque([start])
    answer = []

    while q and len(answer) < k:
        level_size = len(q)
        level_candidates = []

        for _ in range(level_size):
            node = q.popleft()

            if node != start and node not in watched_set:
                level_candidates.append(node)

            for nei in adj[node]:
                if 0 <= nei < n and not visited[nei]:
                    visited[nei] = True
                    q.append(nei)

        level_candidates.sort(key=lambda movie: (-rating[movie], movie))
        answer.extend(level_candidates)

    return answer[:k]

Time complexity: O(M + E + M log M) in the worst case, where M is the number of movies and E is the number of edges. The BFS is O(M + E), and sorting the discovered layers is at worst O(M log M) total.. Space complexity: O(M) for the visited array, queue, and collected recommendations..

Hints

  1. Use BFS because every edge has the same cost, so BFS gives the shortest hop distance.
  2. Process one BFS layer at a time. All movies found in the same layer have the same distance, so you only need to sort that layer by rating and movie ID.

Part 2: Classify stale and hot cache keys for movie recommendations

A production recommendation service caches results by starting movie ID. Each cached key depends on a list of movie IDs that influenced that recommendation. If any of those dependent movies changes, the cached result becomes stale and must be recomputed. A starting movie is called a hot key if it appears at least hot_threshold times in recent requests. Hot keys should be protected with mitigations such as request coalescing or per-key throttling. For every starting movie that appears either: - in requests, or - as a key in cache_entries, return one action string in the form movie_id:action, sorted by movie_id. Possible actions: - recompute_and_protect: the key is both stale and hot - recompute: the key is stale only - protect_hot_key: the key is hot only - ok: neither stale nor hot

Constraints

  • 0 <= len(requests) <= 200000
  • 0 <= sum(len(v) for v in cache_entries.values()) <= 200000
  • 1 <= hot_threshold <= 1000000000
  • Movie IDs are non-negative integers
  • cache_entries may contain keys with empty dependency lists

Examples

Input: ([1, 2, 1, 3, 1, 2], {1: [1, 4, 5], 2: [2, 7], 4: [8]}, [5, 7], 3)

Expected Output: ["1:recompute_and_protect", "2:recompute", "3:ok", "4:ok"]

Explanation: Key 1 is requested 3 times and depends on updated movie 5, so it is both hot and stale. Key 2 is stale only. Key 3 is requested but not hot. Key 4 is cached but unaffected.

Input: ([5, 5, 5, 6], {2: [1], 5: [9], 6: [10]}, [11], 2)

Expected Output: ["2:ok", "5:protect_hot_key", "6:ok"]

Explanation: No cache entry is stale because updated movie 11 is not in any dependency list. Key 5 is hot because it appears 3 times, which is at least 2.

Input: ([], {1: [2, 3], 2: [], 3: [4]}, [3, 4], 1)

Expected Output: ["1:recompute", "2:ok", "3:recompute"]

Explanation: There are no requests, so nothing is hot. Keys 1 and 3 are stale because their dependency lists contain updated movies.

Input: ([], {}, [], 2)

Expected Output: []

Explanation: Edge case: no requests and no cached keys, so there is nothing to classify.

Solution

def solution(requests, cache_entries, updated_movies, hot_threshold):
    counts = {}
    for movie_id in requests:
        counts[movie_id] = counts.get(movie_id, 0) + 1

    updated_set = set(updated_movies)
    all_keys = set(cache_entries.keys()) | set(counts.keys())

    result = []
    for movie_id in sorted(all_keys):
        deps = cache_entries.get(movie_id, [])
        stale = any(dep in updated_set for dep in deps)
        hot = counts.get(movie_id, 0) >= hot_threshold

        if stale and hot:
            action = 'recompute_and_protect'
        elif stale:
            action = 'recompute'
        elif hot:
            action = 'protect_hot_key'
        else:
            action = 'ok'

        result.append(f'{movie_id}:{action}')

    return result

Time complexity: O(R + D + U log U), where R is the number of requests, D is the total number of dependency IDs across all cache entries, and U is the number of unique movie IDs that appear in requests or as cache keys.. Space complexity: O(Ru + Mu + U), where Ru is the number of unique requested keys, Mu is the number of updated movies, and U is the number of output keys..

Hints

  1. First count how many times each movie ID appears in requests to find hot keys.
  2. Convert updated_movies to a set so you can test whether a cache entry is stale by scanning its dependency list once.
Last updated: May 22, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Flower Placement and Directory Deletion - Google (medium)
  • Compute Turnstile Crossing Times - Google (hard)
  • Simulate In-Place Cellular State Updates - Google (hard)
  • Determine Whether a Word Exists in a Graph - Google (medium)
  • Implement Checksums and Feature Rollout Evaluation - Google (medium)