Extend cloud file system with copy and compression
Company: Meta
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
Quick Answer: This question evaluates understanding of data structures and algorithms for stateful resource management, covering bookkeeping of per-user quotas, file metadata operations (copy, compress/decompress), string filtering and sorting, and efficient selection and deletion of largest elements.
Constraints
- 0 <= len(queries) <= 10^5
- User capacities and file sizes in add operations are integers in the range [0, 10^9], and file names are unique among live files
- Use lexicographically smaller name first when update_capacity must delete among files with equal size
Examples
Input: [('add_user', 'u1', 15), ('add_file_by', 'u1', 'fileA', 4), ('add_file_by', 'u1', 'fileB', 5), ('copy_file', 'fileA', 'fileA_copy'), ('get_file_size', 'fileA_copy'), ('get_n_largest', 'file', '', 3)]
Expected Output: [True, True, True, True, 4, ['fileB(5)', 'fileA(4)', 'fileA_copy(4)']]
Explanation: The copy succeeds because u1 still has enough remaining capacity. The largest matching files are ordered by size descending, then name ascending.
Input: [('add_user', 'u1', 10), ('add_file_by', 'u1', 'alpha', 4), ('add_file_by', 'u1', 'beta', 4), ('add_file_by', 'u1', 'gamma', 2), ('update_capacity', 'u1', 6), ('get_file_size', 'alpha'), ('get_n_largest', '', '', 5), ('add_file_by', 'u1', 'delta', 1)]
Expected Output: [True, True, True, True, True, None, ['beta(4)', 'gamma(2)'], False]
Explanation: After lowering capacity to 6, one size-4 file must be deleted. alpha and beta tie in size, so alpha is removed first because its name is lexicographically smaller.
Input: [('add_user', 'u1', 5), ('add_file_by', 'u1', 'doc', 4), ('compress_file', 'doc'), ('compress_file', 'doc.cmp'), ('get_file_size', 'doc'), ('get_file_size', 'doc.cmp'), ('decompress_file', 'doc.cmp'), ('add_file_by', 'u1', 'extra', 2)]
Expected Output: [True, True, True, False, None, 2, True, False]
Explanation: doc is replaced by doc.cmp of size 2. Compressing an already compressed file fails. Decompression restores doc to size 4, leaving no room for extra.
Input: [('add_user', 'u1', 8), ('add_file_by', 'u1', 'a', 4), ('compress_file', 'a'), ('add_file_by', 'u1', 'a', 5), ('decompress_file', 'a.cmp'), ('get_n_largest', '', '', 5)]
Expected Output: [True, True, True, True, False, ['a(5)', 'a.cmp(2)']]
Explanation: After compressing a into a.cmp, the original name becomes free and can be reused. Decompression then fails because the target name a already exists.
Input: [('add_user', 'u1', 3), ('add_file_by', 'u2', 'ghost', 1), ('add_file', 'sys', 7), ('copy_file', 'sys', 'sys_backup'), ('get_n_largest', 'sys', '', 5), ('update_capacity', 'nobody', 1)]
Expected Output: [True, False, True, True, ['sys(7)', 'sys_backup(7)'], False]
Explanation: Adding a file for a missing user fails. admin can add and copy files. With equal sizes, sys comes before sys_backup because of lexicographic order.
Input: []
Expected Output: []
Explanation: No queries means no results.
Solution
def solution(queries):
import heapq
SUFFIX = '.cmp'
ADMIN_CAP = 10 ** 18
users = {'admin': {'cap': ADMIN_CAP, 'used': 0}}
files = {}
heaps = {'admin': []}
def add_owned(user_id, name, size):
if user_id not in users or name in files:
return False
if users[user_id]['used'] + size > users[user_id]['cap']:
return False
files[name] = (size, user_id)
users[user_id]['used'] += size
heaps.setdefault(user_id, [])
heapq.heappush(heaps[user_id], (-size, name))
return True
answers = []
for query in queries:
op = query[0]
if op == 'add_user':
_, user_id, capacity = query
if user_id in users:
answers.append(False)
else:
users[user_id] = {'cap': capacity, 'used': 0}
heaps[user_id] = []
answers.append(True)
elif op == 'add_file_by':
_, user_id, name, size = query
answers.append(add_owned(user_id, name, size))
elif op == 'add_file':
_, name, size = query
answers.append(add_owned('admin', name, size))
elif op == 'copy_file':
_, from_name, to_name = query
if from_name not in files or to_name in files:
answers.append(False)
else:
size, owner = files[from_name]
answers.append(add_owned(owner, to_name, size))
elif op == 'get_file_size':
_, name = query
answers.append(files[name][0] if name in files else None)
elif op == 'get_n_largest':
_, prefix, suffix, n = query
if n <= 0:
answers.append([])
continue
matched = []
for name, (size, _) in files.items():
if name.startswith(prefix) and name.endswith(suffix):
matched.append((name, size))
matched.sort(key=lambda item: (-item[1], item[0]))
answers.append([f"{name}({size})" for name, size in matched[:n]])
elif op == 'update_capacity':
_, user_id, new_capacity = query
if user_id not in users:
answers.append(False)
else:
users[user_id]['cap'] = new_capacity
heap = heaps.setdefault(user_id, [])
while users[user_id]['used'] > new_capacity:
while heap:
neg_size, name = heapq.heappop(heap)
size = -neg_size
info = files.get(name)
if info is not None and info[0] == size and info[1] == user_id:
del files[name]
users[user_id]['used'] -= size
break
else:
break
answers.append(True)
elif op == 'compress_file':
_, name = query
if name not in files or name.endswith(SUFFIX):
answers.append(False)
else:
new_name = name + SUFFIX
if new_name in files:
answers.append(False)
else:
size, owner = files[name]
new_size = size // 2
del files[name]
users[owner]['used'] -= size
files[new_name] = (new_size, owner)
users[owner]['used'] += new_size
heaps.setdefault(owner, [])
heapq.heappush(heaps[owner], (-new_size, new_name))
answers.append(True)
elif op == 'decompress_file':
_, name = query
if name not in files or not name.endswith(SUFFIX):
answers.append(False)
else:
size, owner = files[name]
new_name = name[:-len(SUFFIX)]
new_size = size * 2
if new_name in files or users[owner]['used'] - size + new_size > users[owner]['cap']:
answers.append(False)
else:
del files[name]
users[owner]['used'] -= size
files[new_name] = (new_size, owner)
users[owner]['used'] += new_size
heaps.setdefault(owner, [])
heapq.heappush(heaps[owner], (-new_size, new_name))
answers.append(True)
else:
raise ValueError(f'Unknown operation: {op}')
return answersTime complexity: Amortized O(log F) for add/copy/compress/decompress operations, O(k log F) for update_capacity when k files are deleted, and O(F log F) for each get_n_largest query, where F is the number of live files. Space complexity: O(F + U), where F is the number of live files and U is the number of users.
Hints
- Use hash maps for users and files so that existence checks, ownership lookups, and size lookups are O(1).
- For update_capacity, keep a max-heap per user keyed by (-size, name). Because files can be deleted or renamed, use lazy deletion when popping from the heap.