Design an in-memory cloud file system
Company: Meta
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
Quick Answer: This question evaluates understanding of object-oriented design, stateful data structures, and resource management including user quotas, ownership semantics, and file metadata operations.
Constraints
- 1 <= len(operations) <= 10^5
- 0 <= capacity <= 10^9
- 1 <= size <= 10^9
- 0 <= n <= 10^5
- 1 <= len(user_id), len(name) <= 50
- The sum of lengths of all file names appearing in the input is at most 2 * 10^5
- `admin` exists from the start, cannot be re-added, and cannot be removed by `merge_user(x, 'admin')`
- No successful test operation requires returning admin's infinite remaining capacity
Examples
Input: [('add_user', 'alice', 10), ('add_user', 'alice', 5), ('add_file', 'sys.log', 4), ('add_file_by', 'alice', 'a.txt', 3), ('add_file_by', 'alice', 'a.txt', 2), ('add_file_by', 'alice', 'b.txt', 8), ('get_file_size', 'sys.log'), ('get_n_largest', '', 3), ('delete_file', 'a.txt'), ('add_file_by', 'alice', 'b.txt', 8), ('get_n_largest', '', 5)]
Expected Output: [True, False, True, 7, None, None, 4, ['sys.log(4)', 'a.txt(3)'], 3, 2, ['b.txt(8)', 'sys.log(4)']]
Explanation: Covers duplicate users, duplicate file names, insufficient capacity, deleting a file to refund space, and querying the largest files across the whole system with an empty prefix.
Input: [('add_user', 'u1', 5), ('add_user', 'u2', 10), ('add_file_by', 'u1', 'cat', 3), ('add_file_by', 'u2', 'car', 4), ('add_file_by', 'u2', 'cap', 4), ('get_n_largest', 'ca', 5), ('merge_user', 'u1', 'u2'), ('delete_file', 'car'), ('get_n_largest', 'ca', 5), ('merge_user', 'u1', 'u1'), ('merge_user', 'u1', 'u2')]
Expected Output: [True, True, 2, 6, 2, ['cap(4)', 'car(4)', 'cat(3)'], 4, 4, ['cap(4)', 'cat(3)'], None, None]
Explanation: Files with prefix 'ca' are sorted by size descending, then name ascending for ties. After merging, u1 gains u2's remaining capacity and ownership of u2's files.
Input: [('add_user', 'alice', 10), ('add_user', 'bob', 10), ('add_file_by', 'alice', 'doc', 4), ('add_file_by', 'alice', 'img', 3), ('backup_user', 'alice'), ('delete_file', 'doc'), ('add_file_by', 'bob', 'doc', 2), ('add_file_by', 'alice', 'tmp', 2), ('restore_user', 'alice'), ('get_n_largest', '', 10), ('restore_user', 'bob')]
Expected Output: [True, True, 6, 3, 2, 4, 8, 5, 1, ['img(3)', 'doc(2)'], 0]
Explanation: Alice backs up two files. Later, Bob takes the name 'doc'. On restore, Alice's current files are removed first, then only 'img' is restored because 'doc' is already used by Bob.
Input: [('add_user', 'admin', 5), ('add_file', 'root', 1), ('get_file_size', 'missing'), ('delete_file', 'missing'), ('get_n_largest', 'zzz', 3), ('add_file_by', 'ghost', 'x', 1), ('add_user', 'eve', 0), ('backup_user', 'eve'), ('add_file_by', 'eve', 'tiny', 1), ('restore_user', 'eve'), ('merge_user', 'eve', 'admin')]
Expected Output: [False, True, None, None, [], None, True, 0, None, 0, None]
Explanation: Covers missing files, nonexistent users, an empty prefix result, a user with zero capacity, backing up an empty file set, restoring from an empty backup, and the invalid case of trying to merge away the built-in admin user.
Solution
def solution(operations):
from collections import defaultdict
ADMIN = 'admin'
INF = float('inf')
users = {ADMIN: INF} # user_id -> remaining capacity
files = {} # file_name -> (size, owner)
user_files = defaultdict(set) # user_id -> set of owned file names
user_files[ADMIN] = set()
backups = {} # user_id -> {file_name: size}
prefix_map = defaultdict(set) # prefix -> set of file names
def add_to_prefixes(name):
prefix_map[''].add(name)
prefix = ''
for ch in name:
prefix += ch
prefix_map[prefix].add(name)
def remove_from_prefixes(name):
names = prefix_map.get('')
if names is not None:
names.discard(name)
if not names:
prefix_map.pop('', None)
prefix = ''
for ch in name:
prefix += ch
names = prefix_map.get(prefix)
if names is not None:
names.discard(name)
if not names:
prefix_map.pop(prefix, None)
def create_file(owner, name, size, check_capacity=True):
if owner not in users or name in files:
return None
if owner != ADMIN and check_capacity and size > users[owner]:
return None
if owner != ADMIN:
users[owner] -= size
files[name] = (size, owner)
user_files[owner].add(name)
add_to_prefixes(name)
return users[owner] if owner != ADMIN else INF
def remove_file(name):
if name not in files:
return None
size, owner = files.pop(name)
user_files[owner].remove(name)
if owner != ADMIN:
users[owner] += size
remove_from_prefixes(name)
return size
results = []
for op in operations:
kind = op[0]
if kind == 'add_user':
_, user_id, capacity = op
if user_id in users:
results.append(False)
else:
users[user_id] = capacity
user_files[user_id] = set()
results.append(True)
elif kind == 'add_file':
_, name, size = op
if name in files:
results.append(False)
else:
create_file(ADMIN, name, size, check_capacity=False)
results.append(True)
elif kind == 'add_file_by':
_, user_id, name, size = op
results.append(create_file(user_id, name, size, check_capacity=True))
elif kind == 'get_file_size':
_, name = op
results.append(files[name][0] if name in files else None)
elif kind == 'delete_file':
_, name = op
results.append(remove_file(name))
elif kind == 'get_n_largest':
_, prefix, n = op
if n <= 0:
results.append([])
else:
matching = prefix_map.get(prefix, set())
ordered = sorted(matching, key=lambda fname: (-files[fname][0], fname))
results.append([f'{fname}({files[fname][0]})' for fname in ordered[:n]])
elif kind == 'merge_user':
_, user1, user2 = op
if user1 not in users or user2 not in users or user1 == user2 or user2 == ADMIN:
results.append(None)
else:
for fname in list(user_files[user2]):
size, _ = files[fname]
files[fname] = (size, user1)
user_files[user1].add(fname)
user_files.pop(user2, None)
if user1 != ADMIN:
users[user1] += users[user2]
users.pop(user2, None)
backups.pop(user2, None)
results.append(users[user1] if user1 != ADMIN else INF)
elif kind == 'backup_user':
_, user_id = op
if user_id not in users:
results.append(None)
else:
snapshot = {fname: files[fname][0] for fname in user_files[user_id]}
backups[user_id] = snapshot
results.append(len(snapshot))
elif kind == 'restore_user':
_, user_id = op
if user_id not in users:
results.append(None)
elif user_id not in backups:
results.append(0)
else:
snapshot = backups[user_id]
for fname in list(user_files[user_id]):
remove_file(fname)
restored = 0
for fname, size in snapshot.items():
if fname not in files:
create_file(user_id, fname, size, check_capacity=False)
restored += 1
results.append(restored)
else:
raise ValueError(f'Unknown operation: {kind}')
return results
Time complexity: O(total name-length updates for add/delete/restore + total files touched by merge/backup/restore + sum(k log k) over all `get_n_largest` calls, where k is the number of files matching that prefix). Space complexity: O(number of users + number of files + total stored prefixes + total size of the latest backups).
Hints
- Use multiple hash maps: one for `file -> (size, owner)`, one for `user -> remaining capacity`, and one for `user -> owned file names`.
- For `get_n_largest(prefix, n)`, avoid scanning every file each time by maintaining a mapping from every prefix to the set of file names currently using that prefix.