Design an in-memory cloud storage system

Q: Design an in-memory cloud storage system

This question evaluates system design competencies including API design, in-memory data modeling, efficient data structures for top-k queries, quota and merge semantics, snapshot and restore strategies, and time/space complexity analysis.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Take-home Project rounds at Meta.

Q: What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Meta during technical interviews.

Question

In-Memory Cloud Storage Service (Take-home)

Design and implement an in-memory cloud storage service that maps files (objects) to their metadata. The system maintains files and information about them — at minimum a name, a size, and arbitrary metadata. The project is staged in four levels; the deliverables at each level build on the prior ones.

This is a single-process, in-memory store: there is no real filesystem, no disk persistence, and no network. Focus on clear APIs, correctness, time/space complexity, and clean, runnable core implementations. Each level should be implemented so that all of its behaviors are exercisable in code (pseudocode or a real language is fine).

Constraints & Assumptions

In-memory only. No persistence to a real filesystem is required. You never store the file bytes — size is just an integer; only metadata is kept.
Single process / single thread. You do not need to implement concurrency control, but you should note where locking would go.
Metadata is an arbitrary key→value map supplied by the caller; you do not interpret its contents.
The levels are cumulative: by Level 3 your model has users; Levels 2 and 4 must be consistent with that model.
Every operation must come with a stated time and space complexity , and every mutation must preserve clearly-stated invariants (e.g. per-user byte usage equals the sum of that user's file sizes).
Prefer deterministic behavior everywhere ordering or conflict-resolution is involved (no reliance on hash/iteration order).

Clarifying Questions to Ask

How are file identifiers assigned — server-minted (e.g. UUIDs) or caller-supplied? This decides whether "duplicate identifier" is even a possible conflict.
Are file names unique globally, unique per user, or not unique at all? This determines what a "name conflict" means during merge and restore.
Are files mutable after creation (is there an update API), or is re-upload modeled as delete + add?
Should get / delete on a missing identifier raise an error or be idempotent (return null / false)?
For top-k , do callers want the result strictly ordered, and what is the required tie-break when sizes are equal?
For merge and restore , is the expected default all-or-nothing (atomic) or best-effort (apply what fits) , and how should quota interact with each?

Part 1 — Core CRUD

Provide APIs to:

Add a file — store at least its name, size, and arbitrary metadata; return the created identifier.
Retrieve a file by identifier.
Delete a file by identifier.

Define the request/response schema for each API (arguments in, values out), the error model (how not-found, invalid input, and conflicts such as a duplicate name are surfaced), and the time/space complexity of each operation.

What This Part Should Cover

A precise schema for each call and which fields are required vs optional; what add returns.
A typed error model that distinguishes invalid input, not-found, and conflict (and a clear choice of whether get/delete are lenient/idempotent or raise).
Correct $O(1)$ get/delete and the cost of add, with the invariants that the indexes maintain.

Part 2 — Top-k Largest Files

Add APIs to query:

The k largest files globally .
The k largest files per user .

Specify the tie-breaker when two files have equal size (e.g. by name, then by identifier) so the result is a total, deterministic order, and justify your data-structure and complexity choices.

What This Part Should Cover

A fully specified, deterministic tie-break order and how the chosen structure realizes it.
The data structure for global vs per-user top-k and the per-query complexity in terms of $k$ and $n$ .
How deletes (and, later, merges that re-home a file) are reconciled with the index so a query never returns a stale or duplicated entry.

Part 3 — Users and Quotas; Account Merge

Introduce users, each with a storage capacity limit (quota, in bytes). Adding a file must respect the owner's quota. Then implement merging two users' accounts (move one user's files into another).

Define how merge resolves conflicts — including duplicate file identifiers and duplicate names — and decide whether merge is all-or-nothing or best-effort, justifying the choice. Provide complexities.

What This Part Should Cover

Quota enforced on every add and on the merge as a whole, with an explicit usage invariant.
A conflict policy that names which conflicts are even possible (identifier vs name) and resolves names deterministically.
A justified atomic-vs-best-effort choice, with a pre-mutation validation pass for the atomic path, and what the merge returns (e.g. moved / renamed / skipped counts).

Part 4 — Backup and Restore

Support backing up and restoring a user's files.

Define the snapshot semantics (point-in-time vs. rolling/incremental), the storage-overhead trade-off of your snapshot design (e.g. deep copy vs. copy-on-write), and the restore policy: replace vs. merge behavior, conflict handling, and quota enforcement. Provide complexities.

What This Part Should Cover

Clear snapshot semantics and an honest accounting of what is copied vs shared, with the overhead trade-off (deep copy vs COW/incremental).
A replace vs merge restore policy with explicit conflict handling and a quota pre-check, applied before any destructive mutation.
The correctness argument for how restore interacts with an intervening merge (id re-homing) so no other user's account is corrupted.

What a Strong Answer Covers

Across all four levels, a strong submission ties the levels together with a single coherent model rather than four disconnected features:

A small set of data structures (primary store + per-user indexes + ordering index) reused across all levels, each justified by the operation it makes cheap.
Explicit invariants (usage = sum of sizes; name uniqueness within a user; exactly one owner per live id) that every mutation in Parts 1–4 provably preserves.
Consistent complexity reporting in terms of $n$ (total files), $u$ (a user's files), and $k$ , for every operation.
The cross-cutting correctness traps that only appear when levels interact: a merge that re-homes a file must not let top-k double-count it; a restore must not rebind an id a merge moved away; quota checks must be net-of-frees.
A clear error model and deterministic conflict resolution shared by add, merge, and restore.

Follow-up Questions

How would you make this thread-safe ? Where do the locks go, and how do you avoid deadlock when merge touches two users at once?
How would you scale beyond one process ? Sketch sharding by owner, how global top-k is computed across shards, and what merge/restore become.
If snapshots are taken frequently and metadata is large, when would you switch from deep-copy snapshots to copy-on-write or incremental snapshots, and what does that cost at restore time?
How would you support an update-file API (change size/metadata) while keeping the usage invariant and the top-k index correct?

Design an in-memory cloud storage system

Quick Overview

Design an in-memory cloud storage system

In-Memory Cloud Storage Service (Take-home)

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Core CRUD

What This Part Should Cover

Part 2 — Top-k Largest Files

What This Part Should Cover

Part 3 — Users and Quotas; Account Merge

What This Part Should Cover

Part 4 — Backup and Restore

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP

Design an in-memory cloud storage system

Quick Overview

Design an in-memory cloud storage system

In-Memory Cloud Storage Service (Take-home)

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Core CRUD

What This Part Should Cover

Part 2 — Top-k Largest Files

What This Part Should Cover

Part 3 — Users and Quotas; Account Merge

What This Part Should Cover

Part 4 — Backup and Restore

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP