Recommend top-K movies from similarity graph
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of graph traversal and shortest-path distance concepts along with multi-criteria ranking (distance, rating, ID) and efficient exploration of large undirected graphs to avoid revisiting nodes.
Part 1: Recommend top-K movies from similarity graph
Constraints
- 1 <= len(adj) == len(rating) <= 100000
- 0 <= start < len(adj)
- 0 <= k <= len(adj)
- Each neighbor in adj[i] is a valid movie ID in the range [0, len(adj) - 1]
- The graph is undirected, but the input may still contain repeated edges; do not revisit nodes infinitely
- watched may or may not contain start, but start must never be recommended
Examples
Input: ([[1, 2], [0, 3], [0, 3, 4], [1, 2], [2]], [5, 7, 6, 9, 8], 0, [0, 2], 3)
Expected Output: [1, 3, 4]
Explanation: Movie 1 is distance 1 and unwatched. Movies 3 and 4 are distance 2. Among distance-2 movies, rating 9 beats rating 8.
Input: ([[1, 2], [0, 3], [0, 4], [1], [2]], [1, 8, 8, 7, 9], 0, [0], 4)
Expected Output: [1, 2, 4, 3]
Explanation: Movies 1 and 2 are both distance 1 with equal rating, so smaller ID 1 comes first. At distance 2, movie 4 has higher rating than movie 3.
Input: ([[1], [0], []], [3, 4, 5], 0, [0], 5)
Expected Output: [1]
Explanation: Only movie 1 is reachable and unwatched. Movie 2 is disconnected.
Input: ([[]], [10], 0, [], 1)
Expected Output: []
Explanation: Edge case: there is only the start movie, and the start movie itself cannot be recommended.
Solution
def solution(adj, rating, start, watched, k):
from collections import deque
n = len(adj)
if k <= 0 or start < 0 or start >= n:
return []
watched_set = set(watched)
visited = [False] * n
visited[start] = True
q = deque([start])
answer = []
while q and len(answer) < k:
level_size = len(q)
level_candidates = []
for _ in range(level_size):
node = q.popleft()
if node != start and node not in watched_set:
level_candidates.append(node)
for nei in adj[node]:
if 0 <= nei < n and not visited[nei]:
visited[nei] = True
q.append(nei)
level_candidates.sort(key=lambda movie: (-rating[movie], movie))
answer.extend(level_candidates)
return answer[:k]Time complexity: O(M + E + M log M) in the worst case, where M is the number of movies and E is the number of edges. The BFS is O(M + E), and sorting the discovered layers is at worst O(M log M) total.. Space complexity: O(M) for the visited array, queue, and collected recommendations..
Hints
- Use BFS because every edge has the same cost, so BFS gives the shortest hop distance.
- Process one BFS layer at a time. All movies found in the same layer have the same distance, so you only need to sort that layer by rating and movie ID.
Part 2: Classify stale and hot cache keys for movie recommendations
Constraints
- 0 <= len(requests) <= 200000
- 0 <= sum(len(v) for v in cache_entries.values()) <= 200000
- 1 <= hot_threshold <= 1000000000
- Movie IDs are non-negative integers
- cache_entries may contain keys with empty dependency lists
Examples
Input: ([1, 2, 1, 3, 1, 2], {1: [1, 4, 5], 2: [2, 7], 4: [8]}, [5, 7], 3)
Expected Output: ["1:recompute_and_protect", "2:recompute", "3:ok", "4:ok"]
Explanation: Key 1 is requested 3 times and depends on updated movie 5, so it is both hot and stale. Key 2 is stale only. Key 3 is requested but not hot. Key 4 is cached but unaffected.
Input: ([5, 5, 5, 6], {2: [1], 5: [9], 6: [10]}, [11], 2)
Expected Output: ["2:ok", "5:protect_hot_key", "6:ok"]
Explanation: No cache entry is stale because updated movie 11 is not in any dependency list. Key 5 is hot because it appears 3 times, which is at least 2.
Input: ([], {1: [2, 3], 2: [], 3: [4]}, [3, 4], 1)
Expected Output: ["1:recompute", "2:ok", "3:recompute"]
Explanation: There are no requests, so nothing is hot. Keys 1 and 3 are stale because their dependency lists contain updated movies.
Input: ([], {}, [], 2)
Expected Output: []
Explanation: Edge case: no requests and no cached keys, so there is nothing to classify.
Solution
def solution(requests, cache_entries, updated_movies, hot_threshold):
counts = {}
for movie_id in requests:
counts[movie_id] = counts.get(movie_id, 0) + 1
updated_set = set(updated_movies)
all_keys = set(cache_entries.keys()) | set(counts.keys())
result = []
for movie_id in sorted(all_keys):
deps = cache_entries.get(movie_id, [])
stale = any(dep in updated_set for dep in deps)
hot = counts.get(movie_id, 0) >= hot_threshold
if stale and hot:
action = 'recompute_and_protect'
elif stale:
action = 'recompute'
elif hot:
action = 'protect_hot_key'
else:
action = 'ok'
result.append(f'{movie_id}:{action}')
return resultTime complexity: O(R + D + U log U), where R is the number of requests, D is the total number of dependency IDs across all cache entries, and U is the number of unique movie IDs that appear in requests or as cache keys.. Space complexity: O(Ru + Mu + U), where Ru is the number of unique requested keys, Mu is the number of updated movies, and U is the number of output keys..
Hints
- First count how many times each movie ID appears in requests to find hot keys.
- Convert updated_movies to a set so you can test whether a cache entry is stale by scanning its dependency list once.