Design cloud storage system
In-Memory Cloud Storage: Design and Implement
You are asked to design and implement an in-memory cloud storage system that maintains a mapping from file names to their metadata (size) and supports both single-user and multi-user operations.
Assumptions to make explicit:
-
File names are unique per user. The early single-user functions operate in a special default user namespace.
-
Sizes are non-negative integers (bytes). Capacity is the maximum total size (sum of file sizes) allowed per user.
-
For get_n_largest(prefix, n), return the n largest files across all users whose names start with prefix. Each result should identify the user.
-
merge_user(u1, u2) combines user2 into user1; user1 remains, user2 is deleted. Capacity becomes the sum of both users' capacities.
-
If file name collisions occur during merge, rename the incoming conflicting file(s) from user2 by appending a suffix to ensure uniqueness (e.g., "name (merged 2)").
-
backup_user and restore_user operate per user and do not affect other users' files or capacities.
Required API
Single-user (default namespace):
-
add_file(name, size)
-
get_file_size(name) → size or None
-
delete_file(name)
-
get_n_largest(prefix, n) → list of (user_id, name, size)
Multi-user:
-
add_user(user_id, capacity)
-
add_file_by(user_id, name, size) // enforce per-user capacity limits
-
merge_user(user_id1, user_id2) // combine users and their files; capacity sums; rename on conflicts
-
backup_user(user_id) → version_id // versioned backups per user
-
restore_user(user_id, version_id=None) // restore to a saved version (latest if None)
Deliverables
-
Data structures and algorithms to support the operations.
-
Clear semantics for edge cases (e.g., capacity enforcement, file collisions on merge, restore behavior).
-
Time/space complexity discussion and potential optimizations for prefix queries.
-
A working in-memory implementation (any mainstream language) with small usage examples/tests.
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify users, core use cases, read/write patterns, scale, latency, availability, and data retention.
-
State explicit assumptions before making sizing or architecture decisions.
-
Prioritize the functional path first, then address reliability, security, observability, and rollout.
What a Strong Answer Covers
-
A scoped requirements summary with concrete non-goals and success metrics.
-
API, data model, architecture, consistency, capacity, and operations.
-
Reasoned trade-offs among simple and scalable designs, including bottlenecks and failure modes.
-
A validation, monitoring, migration, and launch plan appropriate for the risk level.
Follow-up Questions
-
What breaks first at 10x traffic or data volume?
-
How would you degrade gracefully during dependency failures?
-
What metrics and alerts would prove the design is healthy after launch?