Find top co-purchased products
Company: Snapchat
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates algorithmic problem-solving and data engineering skills, focusing on computing and ranking co-occurrence frequencies, choosing efficient data structures for pairwise counts, and contrasting offline batch implementation with streaming updates.
Constraints
- 1 <= number of orders <= 10^5
- 0 <= products per order <= 10^3
- Product IDs are non-negative integers and may span a very large ID space
- 1 <= k <= 10^4
- A product ID may appear multiple times within a single order; count it at most once per order
Examples
Input: ([[1, 2, 3], [1, 2, 4], [1, 3], [2, 5]], 1, 2)
Expected Output: [2, 3]
Explanation: Counts co-occurring with 1: {2:2, 3:2, 4:1}. Tie between 2 and 3 broken by smaller ID; top 2 -> [2, 3].
Input: ([[1, 2, 2, 3], [1, 2], [1, 3, 3]], 1, 10)
Expected Output: [2, 3]
Explanation: Duplicates within an order count once: order 1 contributes {2,3}, order 2 {2}, order 3 {3}. Counts {2:2, 3:2}; tie broken by smaller ID -> [2, 3].
Input: ([[10, 20], [10, 30], [10, 20], [10, 30], [10, 40]], 10, 1)
Expected Output: [20]
Explanation: Counts {20:2, 30:2, 40:1}. k=1, tie between 20 and 30 broken by smaller ID -> [20].
Input: ([[5, 6, 7], [8, 9]], 1, 3)
Expected Output: []
Explanation: Product 1 never appears in any order, so no co-occurrences -> [].
Input: ([], 1, 5)
Expected Output: []
Explanation: Edge case: no orders at all -> [].
Input: ([[1, 2], [1, 3], [1, 4]], 1, 5)
Expected Output: [2, 3, 4]
Explanation: All co-occur once: {2:1, 3:1, 4:1}; k=5 exceeds the 3 available, return all sorted by ascending ID.
Input: ([[1, 7], [1, 7], [1, 3], [1, 3], [1, 9]], 1, 2)
Expected Output: [3, 7]
Explanation: Counts {7:2, 3:2, 9:1}. Tie between 3 and 7 broken by smaller ID -> [3, 7].
Hints
- Treat each order as a set so duplicate IDs in one transaction are not double-counted.
- Only orders that actually contain productId contribute to the co-occurrence counts.
- Use a hash map from product ID to count; you never need to materialize the full product ID space.
- Rank with a comparator that sorts by descending count then ascending product ID, then take the first k. For huge result sets a size-k heap with the same ordering avoids fully sorting U items.
- Streaming extension: on each arriving order containing productId, increment counts incrementally; the same map answers point queries. For very large ID spaces, bound memory with count-min sketch / heavy-hitters (Space-Saving) to approximate top-k.