Implement Task Management and Duplicate Detection
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates implementation and system-design skills across two problems: an in-memory task management service (CRUD operations, prioritized listing, user quotas and assignment semantics, and recording task change history) and a file deduplication routine involving directory traversal and content-based file comparison.
Part 1: Task Management - Level 1 Basic CRUD
Constraints
- 0 <= len(operations) <= 100000
- task_id values are strings and are globally unique across the whole input
- priority and created_at are integers
Examples
Input: [('create', 't1', 'alpha', 3, 10, None), ('read', 't1'), ('update', 't1', 'beta', 5, 'u1'), ('read', 't1'), ('delete', 't1'), ('read', 't1')]
Expected Output: [True, ('t1', 'alpha', 3, 10, None), True, ('t1', 'beta', 5, 10, 'u1'), True, None]
Explanation: Basic create, read, update, delete flow. The update keeps created_at = 10.
Input: [('create', 't1', 'a', 1, 1, None), ('create', 't1', 'b', 2, 2, None), ('update', 't2', 'x', 3, None), ('delete', 't2')]
Expected Output: [True, False, False, False]
Explanation: Duplicate create fails, and operations on a missing task also fail.
Input: [('create', 'x', 'a', 1, 100, None), ('delete', 'x'), ('create', 'x', 'b', 2, 200, None)]
Expected Output: [True, True, False]
Explanation: Task IDs cannot be reused after deletion.
Input: []
Expected Output: []
Explanation: Edge case: no operations.
Hints
- A dictionary keyed by task_id gives O(1) average-time CRUD operations.
- Store created_at once on creation and reuse it during updates instead of replacing it.
Part 2: Task Management - Level 2 Sorted Active Task Listing
Constraints
- 0 <= len(operations) <= 100000
- priority values are integers
- Only successful 'create' operations assign creation order
Examples
Input: [('create', 't1', 2), ('create', 't2', 5), ('create', 't3', 5), ('list',), ('update_priority', 't1', 6), ('list',)]
Expected Output: [['t2', 't3', 't1'], ['t1', 't2', 't3']]
Explanation: Initially t2 and t3 tie on priority, so earlier creation t2 comes first. After updating t1 to 6, it becomes first.
Input: [('create', 'x', 1), ('delete', 'x'), ('create', 'x', 9), ('list',)]
Expected Output: [[]]
Explanation: The second create is ignored because IDs cannot be reused, so there are no active tasks.
Input: [('list',)]
Expected Output: [[]]
Explanation: Edge case: listing when there are no tasks.
Input: [('create', 'a', 2), ('update_priority', 'missing', 7), ('create', 'b', 2), ('list',)]
Expected Output: [['a', 'b']]
Explanation: Invalid updates are ignored. Tasks with equal priority are ordered by creation time.
Hints
- Store a monotonic creation counter when a task is first created successfully.
- For each listing, sort by a key like (-priority, creation_order).
Part 3: Task Management - Level 3 User Quotas and Assignment
Constraints
- 0 <= len(operations) <= 100000
- 0 <= quota <= 100000
- All task_id and user_id values are strings
Examples
Input: [('add_user', 'u1', 1), ('add_user', 'u2', 2), ('add_task', 't1'), ('add_task', 't2'), ('assign', 't1', 'u1'), ('assign', 't2', 'u1'), ('assign', 't2', 'u2'), ('get_user_tasks', 'u1'), ('get_user_tasks', 'u2')]
Expected Output: [True, True, True, True, True, False, True, ['t1'], ['t2']]
Explanation: u1 can hold only one task, so assigning t2 to u1 fails. Assigning t2 to u2 succeeds.
Input: [('add_user', 'u1', 2), ('add_user', 'u2', 1), ('add_task', 't1'), ('assign', 't1', 'u1'), ('assign', 't1', 'u2'), ('get_user_tasks', 'u1'), ('get_user_tasks', 'u2'), ('unassign', 't1'), ('unassign', 't1')]
Expected Output: [True, True, True, True, True, [], ['t1'], True, False]
Explanation: The second assignment reassigns t1 from u1 to u2. Unassigning again fails because the task is already unassigned.
Input: [('add_user', 'u0', 0), ('add_task', 't1'), ('assign', 't1', 'u0'), ('get_user_tasks', 'missing'), ('unassign', 't1')]
Expected Output: [True, True, False, None, False]
Explanation: A user with quota 0 cannot receive any tasks. Querying a missing user returns None.
Input: [('add_user', 'u1', 1), ('add_user', 'u1', 2), ('add_task', 't1'), ('add_task', 't1')]
Expected Output: [True, False, True, False]
Explanation: Duplicate users and duplicate tasks are rejected.
Hints
- Track both directions: task -> current user and user -> set of assigned tasks.
- Handle the three assignment cases separately: unassigned task, already assigned to same user, and reassignment to a different user.
Part 4: Task Management - Level 4 Task History Queries
Constraints
- 0 <= len(operations) <= 100000
- task_id and user_id are strings
- Only successful operations generate history entries
Examples
Input: [('create', 't1', 'alpha', 1), ('update', 't1', 'beta', 3), ('assign', 't1', 'u1'), ('unassign', 't1'), ('delete', 't1'), ('history', 't1')]
Expected Output: [[('created', 'alpha', 1), ('updated', 'beta', 3), ('assigned', 'u1'), ('unassigned', 'u1'), ('deleted',)]]
Explanation: All successful state changes appear in order.
Input: [('create', 't2', 'x', 2), ('assign', 't2', 'alice'), ('assign', 't2', 'alice'), ('assign', 't2', 'bob'), ('history', 't2')]
Expected Output: [[('created', 'x', 2), ('assigned', 'alice'), ('unassigned', 'alice'), ('assigned', 'bob')]]
Explanation: Assigning to the same user is ignored. Reassignment creates unassign and assign events.
Input: [('update', 'missing', 'x', 1), ('history', 'missing'), ('create', 'k', 'a', 1), ('delete', 'k'), ('update', 'k', 'b', 2), ('history', 'k')]
Expected Output: [[], [('created', 'a', 1), ('deleted',)]]
Explanation: Invalid operations do not create history. History on a never-created task is empty.
Input: [('create', 'x', 'a', 1), ('delete', 'x'), ('create', 'x', 'b', 2), ('history', 'x')]
Expected Output: [[('created', 'a', 1), ('deleted',)]]
Explanation: The second create is ignored because task IDs cannot be reused.
Hints
- An append-only list per task_id is enough to preserve chronological order.
- Treat reassignment as two history events: unassign the old user, then assign the new one.
Part 5: Find Duplicate Files
Constraints
- 0 <= len(files) <= 100000
- The sum of all content lengths is at most 10^6 bytes in the test data
- Each path is unique
Examples
Input: [('/a.txt', b'abc', True, False), ('/b.txt', b'abc', True, False), ('/c.txt', b'', True, False), ('/d.txt', b'', True, False), ('/e.txt', b'xyz', True, False)]
Expected Output: [['/a.txt', '/b.txt'], ['/c.txt', '/d.txt']]
Explanation: There are two duplicate-content groups: one for b'abc' and one for empty files.
Input: [('/a', b'x', True, False), ('/b', b'x', False, False), ('/c', b'x', True, False), ('/d', b'x', True, True)]
Expected Output: [['/a', '/c']]
Explanation: Unreadable files and symlinks are ignored, so only /a and /c count.
Input: []
Expected Output: []
Explanation: Edge case: no files.
Input: [('/a', b'ab', True, False), ('/b', b'cd', True, False), ('/c', b'ab', True, False)]
Expected Output: [['/a', '/c']]
Explanation: Files /a and /c have identical content. /b has the same size but different bytes.
Hints
- Files with different lengths can never be duplicates, so size is a useful first filter.
- Choose a deterministic ordering for both paths within a group and the groups themselves.