Implement a Cloud Storage Service
Company: Tradedesk
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
Implement an in-memory cloud storage system. Build the class incrementally; later levels must preserve all behavior from earlier levels. File names are globally unique strings, and file sizes are non-negative integers. If an operation cannot be completed, return the failure value specified for that method.
### Level 1: Basic file operations
Implement:
- `add_file(name: str, size: int) -> bool`
- Add a new file with the given name and size.
- Return `true` if the file was added.
- Return `false` if a file with the same name already exists.
- `copy_file(name_from: str, name_to: str) -> bool`
- Copy an existing file to a new file name.
- Return `true` if the copy succeeds.
- Return `false` if the source file does not exist or the destination file already exists.
- `get_file_size(name: str) -> int | None`
- Return the file size if the file exists.
- Return `None` if the file does not exist.
### Level 2: File search
Implement:
- `find_file(prefix: str, suffix: str) -> list[str]`
- Return all files whose names start with `prefix` and end with `suffix`.
- Sort results by file size in descending order.
- If two files have the same size, sort them by file name in lexicographically ascending order.
- Format each result as `name(size)`, for example `logs/a.txt(120)`.
### Level 3: Users and capacity limits
The system has users. Each user has a storage capacity and current usage. There is also a built-in `admin` user with unlimited capacity. Calls to `add_file` from Level 1 should behave as if the `admin` user added the file.
Implement:
- `add_user(user_id: str, capacity: int) -> bool`
- Add a user with the given capacity and zero initial usage.
- Return `true` if the user was added.
- Return `false` if the user already exists.
- `add_file_by(user_id: str, name: str, size: int) -> int | None`
- Add a file owned by `user_id`.
- Return the user's remaining capacity after adding the file.
- Return `None` if the user does not exist, the file name already exists, or adding the file would exceed the user's capacity.
- Update `copy_file(name_from: str, name_to: str) -> bool`
- The copied file must have the same owner as the source file.
- The copy consumes additional capacity from that owner.
- Return `false` if the owner would exceed their capacity.
- `update_capacity(user_id: str, capacity: int) -> int | None`
- Set the user's capacity to the new value.
- If the user's current usage exceeds the new capacity, remove that user's files until usage is less than or equal to the capacity.
- Remove files in order of largest size first; for ties, remove lexicographically smaller file names first.
- Return the number of removed files.
- Return `None` if the user does not exist.
### Level 4: Compression and decompression
Compressed files use the suffix `.COMPRESSED`.
Implement:
- `compress_file(user_id: str, name: str) -> int | None`
- The file must exist and be owned by `user_id`.
- Replace the file named `name` with a compressed file named `name.COMPRESSED`.
- The compressed file has size `floor(original_size / 2)` and the same owner.
- Update the user's usage accordingly.
- Return the user's remaining capacity.
- Return `None` if the file does not exist, is not owned by the user, or the compressed destination name already exists.
- `decompress_file(user_id: str, name: str) -> int | None`
- The file must exist, be owned by `user_id`, and have the suffix `.COMPRESSED`.
- Replace it with the original file name obtained by removing the `.COMPRESSED` suffix.
- The decompressed size is `2 * compressed_size`.
- Return `None` if the original file name already exists or decompression would exceed the user's capacity.
- Otherwise, update the user's usage and return the user's remaining capacity.
Quick Answer: This question evaluates a candidate's ability to implement an in-memory cloud storage system, exercising skills in data structures and stateful API design, string handling and pattern matching, sorting and eviction algorithms, and per-user resource accounting.
Part 1: Basic File Operations in an In-Memory Cloud Store
You are given a sequence of queries for a very small in-memory cloud storage service. File names are globally unique strings, and file sizes are non-negative integers. Process the queries in order and return the result of each query.
Supported operations:
- ('add_file', name, size) -> add a new file and return True if successful, otherwise False if the name already exists.
- ('copy_file', name_from, name_to) -> copy an existing file to a new name and return True if successful, otherwise False if the source does not exist or the destination already exists.
- ('get_file_size', name) -> return the file size, or None if the file does not exist.
Constraints
- 1 <= len(queries) <= 2 * 10^5, but an empty list should also be handled correctly.
- File names are strings and are globally unique when present.
- File sizes are integers in the range 0 to 10^9.
- All operations must be processed in order.
Examples
Input: [('add_file', 'a.txt', 10), ('add_file', 'b.txt', 20), ('copy_file', 'a.txt', 'c.txt'), ('get_file_size', 'c.txt'), ('add_file', 'a.txt', 5), ('copy_file', 'missing', 'x'), ('get_file_size', 'missing')]
Expected Output: [True, True, True, 10, False, False, None]
Explanation: Covers successful add, successful copy, duplicate add, failed copy from a missing file, and missing lookup.
Input: []
Expected Output: []
Explanation: Edge case: no queries.
Input: [('add_file', 'data', 0), ('copy_file', 'data', 'data'), ('get_file_size', 'data')]
Expected Output: [True, False, 0]
Explanation: Edge case: zero-sized file and copying to an existing destination name.
Input: [('copy_file', 'ghost', 'copy'), ('add_file', 'z', 7), ('get_file_size', 'z')]
Expected Output: [False, True, 7]
Explanation: Trying to copy before any file exists should fail.
Hints
- A hash map from file name to file size is enough for all three operations.
- For copy_file, check both failure conditions before inserting the new file.
Part 2: Search Files by Prefix and Suffix
Build an in-memory cloud storage service that supports basic file operations plus search. File names are globally unique strings, and file sizes are non-negative integers. Process all queries in order and return the result of each query.
Supported operations:
- ('add_file', name, size) -> return True if the file was added, else False.
- ('copy_file', name_from, name_to) -> return True if the copy succeeds, else False.
- ('get_file_size', name) -> return the size, or None if the file does not exist.
- ('find_file', prefix, suffix) -> return all files whose names start with prefix and end with suffix.
For find_file, sort matching files by:
1. file size in descending order
2. file name in lexicographically ascending order when sizes are equal
Format each matching file as 'name(size)'.
Constraints
- 1 <= len(queries) <= 10^5, but an empty list should still be handled.
- File sizes are integers in the range 0 to 10^9.
- File names, prefixes, and suffixes are case-sensitive strings.
- A straightforward scan over all files for each find_file query is acceptable for this problem.
Examples
Input: [('add_file', 'a.txt', 10), ('add_file', 'ab.txt', 30), ('add_file', 'ac.log', 30), ('add_file', 'archive.txt', 30), ('copy_file', 'a.txt', 'aa.txt'), ('find_file', 'a', '.txt'), ('find_file', 'ab', '.txt'), ('find_file', 'x', '.txt')]
Expected Output: [True, True, True, True, True, ['ab.txt(30)', 'archive.txt(30)', 'a.txt(10)', 'aa.txt(10)'], ['ab.txt(30)'], []]
Explanation: Tests filtering by both prefix and suffix, sorting by size descending, and lexicographic tie-breaking.
Input: [('find_file', '', '')]
Expected Output: [[]]
Explanation: Edge case: searching an empty storage should return an empty list.
Input: [('add_file', 'pre_suf', 7), ('add_file', 'pre-mid-suf', 7), ('add_file', 'pre-end', 5), ('find_file', 'pre', 'suf')]
Expected Output: [True, True, True, ['pre-mid-suf(7)', 'pre_suf(7)']]
Explanation: Two files match with equal sizes, so lexicographic file-name order decides the result.
Input: [('add_file', 'note', 5), ('copy_file', 'note', 'note'), ('copy_file', 'missing', 'copy'), ('get_file_size', 'copy'), ('find_file', 'n', 'e')]
Expected Output: [True, False, False, None, ['note(5)']]
Explanation: Includes failed copy cases and a simple successful search.
Hints
- You can still store files in a dictionary from name to size.
- For find_file, first collect matches, then sort them with a key like (-size, name).
Part 3: Cloud Storage with Users and Capacity Limits
Extend the cloud storage service to support users with storage limits. There is a built-in 'admin' user with unlimited capacity. The operation ('add_file', name, size) behaves as if the admin added the file.
Process queries in order and return the result of each query.
Supported operations:
- ('add_user', user_id, capacity) -> add a user with zero usage; return True if added, else False.
- ('add_file', name, size) -> add a file owned by admin; return True if added, else False.
- ('add_file_by', user_id, name, size) -> add a file owned by user_id; return remaining capacity, or None on failure.
- ('copy_file', name_from, name_to) -> copy an existing file to a new name with the same owner; return True on success, else False. If the owner is not admin, the copy must fit in that owner's capacity.
- ('get_file_size', name) -> return the size, or None if missing.
- ('update_capacity', user_id, capacity) -> set a user's new capacity. If usage now exceeds capacity, remove that user's files until usage is <= capacity. Remove files by largest size first; when sizes tie, remove lexicographically smaller names first. Return the number of removed files, or None if the user does not exist.
For this standalone problem, queries using user-specific operations will not use 'admin' as user_id.
Constraints
- 1 <= len(queries) <= 2 * 10^5, but an empty list should be handled correctly.
- User capacities and file sizes are integers in the range 0 to 10^9.
- File names are globally unique across all users.
- For user-specific operations in this standalone version, user_id is never 'admin'.
Examples
Input: [('add_user', 'alice', 10), ('add_file_by', 'alice', 'a', 4), ('copy_file', 'a', 'a2'), ('copy_file', 'a', 'a3'), ('add_file', 'admin_file', 100), ('get_file_size', 'admin_file')]
Expected Output: [True, 6, True, False, True, 100]
Explanation: Alice can store one copy of 'a' but not a second copy because it would exceed her capacity.
Input: [('add_user', 'bob', 25), ('add_file_by', 'bob', 'b', 8), ('add_file_by', 'bob', 'a', 8), ('add_file_by', 'bob', 'c', 5), ('update_capacity', 'bob', 13), ('get_file_size', 'a'), ('get_file_size', 'b'), ('get_file_size', 'c')]
Expected Output: [True, 17, 9, 4, 1, None, 8, 5]
Explanation: Bob's usage drops from 21 to 13 by removing only one file. Between 'a' and 'b' (both size 8), 'a' is removed because it is lexicographically smaller.
Input: [('add_user', 'u1', 0), ('add_user', 'u1', 5), ('add_file_by', 'ghost', 'x', 1), ('add_file_by', 'u1', 'x', 0), ('add_file', 'x', 5), ('copy_file', 'x', 'y'), ('get_file_size', 'y')]
Expected Output: [True, False, None, 0, False, True, 0]
Explanation: Edge cases: duplicate user, missing user, zero-capacity user, zero-sized file, and copying a zero-sized file.
Input: []
Expected Output: []
Explanation: Edge case: no queries.
Hints
- Track each file as both size and owner, not just size.
- When capacity decreases, gather that user's files and sort by (-size, name) to remove them in the required order.
Part 4: Cloud Storage with Compression and Decompression
Build an in-memory cloud storage service with users, capacity limits, and file compression. There is a built-in 'admin' user with unlimited capacity. The operation ('add_file', name, size) behaves as if admin added the file.
Process queries in order and return the result of each query.
Supported operations:
- ('add_user', user_id, capacity) -> return True if the user is added, else False.
- ('add_file', name, size) -> add an admin-owned file; return True if added, else False.
- ('add_file_by', user_id, name, size) -> add a user-owned file; return remaining capacity, or None on failure.
- ('copy_file', name_from, name_to) -> copy a file with the same owner; return True on success, else False if missing, destination exists, or capacity would be exceeded.
- ('get_file_size', name) -> return the size, or None if missing.
- ('update_capacity', user_id, capacity) -> update a user's capacity and remove that user's files if needed, using largest-size-first and lexicographically-smaller-name-first tie-breaking; return number of removed files, or None if the user does not exist.
- ('compress_file', user_id, name) -> replace file name with name + '.COMPRESSED'. New size is floor(old_size / 2). Same owner. Return remaining capacity, or None if the file does not exist, is not owned by user_id, or the destination compressed name already exists.
- ('decompress_file', user_id, name) -> replace a '.COMPRESSED' file with its original name by removing the suffix. New size is 2 * compressed_size. Return remaining capacity, or None if the file does not exist, is not owned by user_id, does not end with '.COMPRESSED', the original name already exists, or decompression would exceed capacity.
For this standalone problem, user-specific operations will not use 'admin' as user_id.
Constraints
- 1 <= len(queries) <= 2 * 10^5, but an empty list should still be handled correctly.
- User capacities and file sizes are integers in the range 0 to 10^9.
- File names are globally unique across all users.
- For user-specific operations in this standalone version, user_id is never 'admin'.
Examples
Input: [('add_user', 'alice', 10), ('add_file_by', 'alice', 'report', 6), ('compress_file', 'alice', 'report'), ('get_file_size', 'report'), ('get_file_size', 'report.COMPRESSED'), ('decompress_file', 'alice', 'report.COMPRESSED'), ('get_file_size', 'report')]
Expected Output: [True, 4, 7, None, 3, 4, 6]
Explanation: A successful compress/decompress round trip. The file is renamed and resized each time.
Input: [('add_user', 'bob', 5), ('add_file_by', 'bob', 'x', 5), ('compress_file', 'bob', 'x'), ('add_file_by', 'bob', 'y', 3), ('decompress_file', 'bob', 'x.COMPRESSED'), ('get_file_size', 'x.COMPRESSED')]
Expected Output: [True, 0, 3, 0, None, 2]
Explanation: Edge case: decompression fails because restoring the larger file would exceed Bob's capacity.
Input: [('add_user', 'carol', 20), ('add_file_by', 'carol', 'a', 8), ('add_file_by', 'carol', 'a.COMPRESSED', 1), ('compress_file', 'carol', 'a'), ('add_file', 'admin_doc', 4), ('compress_file', 'carol', 'admin_doc'), ('decompress_file', 'carol', 'a')]
Expected Output: [True, 12, 11, None, True, None, None]
Explanation: Covers destination-name collision during compression, wrong owner, and invalid decompression of a non-compressed file.
Input: [('add_user', 'd', 10), ('add_file_by', 'd', 'm', 9), ('compress_file', 'd', 'm'), ('copy_file', 'm.COMPRESSED', 'm2.COMPRESSED'), ('update_capacity', 'd', 5), ('get_file_size', 'm.COMPRESSED'), ('get_file_size', 'm2.COMPRESSED')]
Expected Output: [True, 1, 6, True, 1, None, 4]
Explanation: After copying a compressed file, lowering capacity forces one removal. Between equal-size files, the lexicographically smaller name is removed first.
Hints
- Store each file as (size, owner). Compression and decompression are really just controlled rename-and-resize operations.
- For decompression, remember that capacity is checked after replacing the compressed file, not in addition to keeping both files.