Group employees by shared interests
Company: Palantir
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's competency in organizing and querying collections, mapping relationships between entities, and reasoning about algorithmic time and space complexity within data structures and grouping operations.
Part 1: Map each interest to the sorted employee IDs
Constraints
- 0 <= N <= 100000, where N is the number of employees
- Employee IDs are unique integers
- The total number of interest strings across all employees is at most 200000
- Each interest string has length between 1 and 50
- Duplicate interests within the same employee should be treated as one interest
Examples
Input: [(101, ['music', 'reading']), (102, ['sports', 'music']), (103, ['reading']), (104, [])]
Expected Output: {'music': [101, 102], 'reading': [101, 103], 'sports': [102]}
Explanation: Employees 101 and 102 share 'music', employees 101 and 103 have 'reading', and only 102 has 'sports'. Employee 104 contributes nothing.
Input: []
Expected Output: {}
Explanation: No employees means no interests in the result.
Input: [(5, ['gaming', 'gaming', 'chess'])]
Expected Output: {'chess': [5], 'gaming': [5]}
Explanation: Repeated 'gaming' for the same employee counts once.
Input: [(7, ['art']), (3, ['art', 'music']), (5, ['music'])]
Expected Output: {'art': [3, 7], 'music': [3, 5]}
Explanation: Employee IDs must be sorted within each interest list.
Input: [(1, []), (2, [])]
Expected Output: {}
Explanation: Employees with no interests do not appear in any mapping.
Solution
def solution(employees):
interest_to_ids = {}
for emp_id, interests in employees:
for interest in set(interests):
if interest not in interest_to_ids:
interest_to_ids[interest] = []
interest_to_ids[interest].append(emp_id)
result = {}
for interest in sorted(interest_to_ids):
result[interest] = sorted(interest_to_ids[interest])
return result
Time complexity: O(T + sum(m_i log m_i) + U log U), where T is the total number of input interest strings, m_i is the number of employees for interest i, and U is the number of unique interests.. Space complexity: O(T), excluding the input; this stores the inverted index and output..
Hints
- Think about reversing the relationship: instead of employee -> interests, build interest -> employees.
- If an employee's list may contain duplicates, convert it to a set before adding the employee ID to each interest bucket.
Part 2: Return all unique employee-ID pairs that share an interest
Constraints
- 0 <= N <= 5000, where N is the number of employees
- Employee IDs are unique integers
- The total number of interest strings across all employees is at most 20000
- Duplicate interests within the same employee should be treated as one interest
- The total number of generated candidate pairs across all interests is at most 200000
Examples
Input: [(1, ['music', 'sports']), (2, ['sports', 'cooking']), (3, ['music']), (4, ['travel'])]
Expected Output: [(1, 2), (1, 3)]
Explanation: Employees 1 and 2 share 'sports', and employees 1 and 3 share 'music'. Employee 4 shares nothing with others.
Input: []
Expected Output: []
Explanation: No employees means no shared-interest pairs.
Input: [(42, ['reading'])]
Expected Output: []
Explanation: A single employee cannot form a pair.
Input: [(10, ['a', 'b', 'b']), (7, ['b', 'c']), (9, ['a', 'c']), (12, [])]
Expected Output: [(7, 9), (7, 10), (9, 10)]
Explanation: Pair (9, 10) shares 'a', pair (7, 10) shares 'b', and pair (7, 9) shares 'c'. Duplicate 'b' for employee 10 counts once.
Input: [(1, ['a', 'b']), (2, ['a', 'b']), (3, ['c'])]
Expected Output: [(1, 2)]
Explanation: Employees 1 and 2 share two interests, but their pair appears only once.
Solution
def solution(employees):
interest_to_ids = {}
for emp_id, interests in employees:
for interest in set(interests):
if interest not in interest_to_ids:
interest_to_ids[interest] = set()
interest_to_ids[interest].add(emp_id)
pairs = set()
for ids in interest_to_ids.values():
sorted_ids = sorted(ids)
for i in range(len(sorted_ids)):
for j in range(i + 1, len(sorted_ids)):
pairs.add((sorted_ids[i], sorted_ids[j]))
return sorted(pairs)
Time complexity: O(T + sum(k_i log k_i + C(k_i, 2)) + P log P), where T is the total number of input interest strings, k_i is the number of employees with interest i, and P is the number of unique output pairs.. Space complexity: O(T + P), excluding the input; this stores the interest groups and the unique output pairs..
Hints
- First group employees by interest, then generate employee pairs inside each interest group.
- Use a set of pairs so that if two employees share multiple interests, the pair is still returned only once.