Solve set equality and ad log top‑K
Company: Pinterest
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This multi-part question evaluates set and multiset operations, frequency counting, top-K computation over time windows, sliding-window and streaming data structures, and character-multiset matching for string construction within the coding & algorithms domain.
Part 1: Set Equality Ignoring Duplicates
Constraints
- 0 <= len(A), len(B) <= 200000
- Array values fit in 32-bit signed integer
- Duplicates may appear any number of times
- The arrays are not guaranteed to be sorted
Examples
Input: ([1, 2, 2, 3], [3, 1, 2])
Expected Output: True
Explanation: Both arrays represent the set {1, 2, 3}.
Input: ([1, 2], [1, 2, 4])
Expected Output: False
Explanation: The second array contains 4, so the sets differ.
Input: ([], [])
Expected Output: True
Explanation: Two empty arrays represent the same empty set.
Input: ([0, -1, 2], [2, 0, -1, -1])
Expected Output: True
Explanation: Duplicates in the second array do not change the set.
Solution
def solution(A, B):
return set(A) == set(B)
Time complexity: O(n + m), where n = len(A) and m = len(B). Space complexity: O(u + v), where u and v are the number of unique elements in A and B.
Hints
- Duplicates do not matter, so focus on unique elements only.
- A hash-based container can compare the unique contents of both arrays in linear time.
Part 2: Top-K Ads in a Time Window (Batch)
Constraints
- 0 <= len(events) <= 200000
- 0 <= W_ms, T_end <= 10^9
- 0 <= K <= 200000
- Each event is [timestamp_ms, ad_id]
- events is sorted by timestamp_ms ascending
- ad_id fits in 32-bit signed integer
Examples
Input: ([[100, 1], [150, 2], [180, 1], [220, 3], [300, 2]], 100, 220, 2)
Expected Output: [1, 2]
Explanation: The window is [120, 220]. Ads 2, 1, and 3 each appear once, so tie-breaking by smaller ad_id gives [1, 2, 3]. The top 2 are [1, 2].
Input: ([[10, 5], [20, 5], [25, 3], [40, 3], [50, 3]], 30, 50, 2)
Expected Output: [3, 5]
Explanation: The window is [20, 50]. Ad 3 appears 3 times and ad 5 appears once.
Input: ([[1, 1], [2, 2]], 5, 10, 3)
Expected Output: []
Explanation: The window is [5, 10], which contains no events.
Input: ([[1, 1], [2, 2]], 10, 2, 0)
Expected Output: []
Explanation: If K is 0, the answer is an empty list.
Solution
def solution(events, W_ms, T_end, K):
from bisect import bisect_left, bisect_right
from collections import Counter
if K <= 0 or not events:
return []
timestamps = [event[0] for event in events]
start = T_end - W_ms
left = bisect_left(timestamps, start)
right = bisect_right(timestamps, T_end)
if left >= right:
return []
counts = Counter()
for i in range(left, right):
ad_id = events[i][1]
counts[ad_id] += 1
ranked = sorted(counts.items(), key=lambda item: (-item[1], item[0]))
return [ad_id for ad_id, _ in ranked[:K]]
Time complexity: O(n + m + u log u), where n is the number of events, m is the number of events inside the window, and u is the number of distinct ad IDs in the window. Space complexity: O(n + u).
Hints
- Since events are sorted by time, first find the left and right boundaries of the window.
- After counting frequencies inside the window, sort by (-count, ad_id) for deterministic output.
Part 3: Streaming Top-K Ads with Sliding Window Queries
Constraints
- 0 <= len(operations) <= 200000
- Ingest operations are of the form [1, timestamp_ms, ad_id]
- Query operations are of the form [2, T_end, W_ms, K]
- Timestamps in ingest operations are non-decreasing
- 0 <= timestamp_ms, T_end, W_ms <= 10^9
- 0 <= K <= 200000
- ad_id fits in 32-bit signed integer
Examples
Input: ([[1, 100, 7], [1, 120, 5], [1, 150, 7], [2, 150, 60, 2]],)
Expected Output: [[7, 5]]
Explanation: The query window is [90, 150]. Ad 7 appears twice and ad 5 appears once.
Input: ([[2, 50, 10, 3], [1, 10, 1], [1, 20, 2], [1, 20, 1], [2, 20, 15, 2], [2, 15, 10, 2]],)
Expected Output: [[], [1, 2], [1]]
Explanation: The first query happens before any ingest, so it returns []. The second query sees timestamps in [5, 20], giving counts {1: 2, 2: 1}. The last query only includes timestamp 10.
Input: ([[1, 1, 3], [1, 2, 2], [1, 3, 2], [1, 4, 3], [2, 4, 10, 5]],)
Expected Output: [[2, 3]]
Explanation: Ads 2 and 3 both appear twice, so the smaller ad_id 2 comes first.
Input: ([[1, 1, 9], [2, 1, 10, 0]],)
Expected Output: [[]]
Explanation: If K is 0, the query answer is an empty list.
Solution
def solution(operations):
from bisect import bisect_left, bisect_right
from collections import Counter
timestamps = []
ad_ids = []
answers = []
for op in operations:
if not op:
continue
if op[0] == 1:
_, timestamp_ms, ad_id = op
timestamps.append(timestamp_ms)
ad_ids.append(ad_id)
elif op[0] == 2:
_, T_end, W_ms, K = op
if K <= 0 or not timestamps:
answers.append([])
continue
start = T_end - W_ms
left = bisect_left(timestamps, start)
right = bisect_right(timestamps, T_end)
if left >= right:
answers.append([])
continue
counts = Counter()
for i in range(left, right):
counts[ad_ids[i]] += 1
ranked = sorted(counts.items(), key=lambda item: (-item[1], item[0]))
answers.append([ad_id for ad_id, _ in ranked[:K]])
return answers
Time complexity: O(n + sum(log n + m_i + u_i log u_i)) over all queries, where n is the number of ingested events, m_i is the number of events in query i's window, and u_i is the number of distinct ad IDs in that window. Space complexity: O(n), where n is the number of ingested events stored.
Hints
- Because ingested timestamps are non-decreasing, you can store them in a simple array and binary search query boundaries.
- For each query, only count impressions whose indices fall inside the time window.
Part 4: Can Target Be Formed from Source Characters?
Constraints
- 0 <= len(source), len(target) <= 200000
- Both strings contain lowercase English letters a-z
- Each character from source can be used at most once
Examples
Input: ("aab", "aba")
Expected Output: True
Explanation: source contains two a characters and one b, which is enough to build target.
Input: ("ab", "abb")
Expected Output: False
Explanation: target needs two b characters but source only has one.
Input: ("abc", "")
Expected Output: True
Explanation: An empty target always can be formed.
Input: ("", "a")
Expected Output: False
Explanation: A non-empty target cannot be formed from an empty source.
Solution
def solution(source, target):
from collections import Counter
if len(target) > len(source):
return False
available = Counter(source)
needed = Counter(target)
for ch, cnt in needed.items():
if available[ch] < cnt:
return False
return True
Time complexity: O(n + m), where n = len(source) and m = len(target). Space complexity: O(1) if the alphabet is fixed to 26 lowercase letters, otherwise O(u) for the number of distinct characters.
Hints
- Count how many times each character is needed in target.
- If any character is required more often than it appears in source, the answer is False.