How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at Harvey.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Harvey during technical interviews.

Design Cloud File Storage | Harvey Interview Question

Design Cloud File Storage

Company: Harvey

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a cloud file storage service similar to Google Drive. Users store and retrieve files through web, desktop, and mobile clients, organize them in folders, and share them with other users. Your design should support the following capabilities: - **Upload** files and **download** files. - **Store and retrieve file metadata** such as name, size, owner, creation time, last-update time, and storage location. - **Organize** files into folders (a folder hierarchy). - **Sharing and access control** — grant other users access to a file or folder at different permission levels. A specific emphasis of this question: clearly separate the **metadata store** (information *about* files) from the **blob/object storage layer** (the actual file bytes), and explain why this separation matters. ### Constraints & Assumptions State your own numbers, but a reasonable working set: - Read-heavy workload: downloads/metadata-reads dominate writes (assume roughly 10:1 to 100:1 read:write). - File sizes span a wide range — from a few KB (documents) to multi-GB (videos, datasets). - Durability is paramount: a stored file must effectively never be lost (target many 9s of durability). - High availability for upload and download paths. - Metadata operations (listing a folder, checking permissions, renaming) should be low-latency (tens of ms). - Single global service; assume one logical region first, then discuss multi-region/geo. ### Clarifying Questions to Ask - What is the expected scale — number of users, total files, total bytes, and upload/download QPS? - What is the maximum file size we must support, and what's the distribution (many small files vs. a few huge ones)? - Do we need file **versioning** / history, or is overwrite-in-place acceptable? - What consistency guarantee is required after an upload — must a download immediately reflect the just-uploaded version (read-your-writes)? - How rich must sharing be — per-user grants, shareable links, organization-wide access, inherited folder permissions? - Do we need full-text search over file names and/or contents, or just metadata listing? - Are there compliance/residency requirements (encryption at rest, regional data residency, audit logs)? ### Part 1 — Core design: metadata store vs. blob store Design the end-to-end system for the core capabilities above. Define the major components, the data model for metadata, the **upload path** and the **download path**, the **folder model**, and **sharing / access control**. Make the boundary between the metadata database and the blob/object store explicit, and justify why each kind of data lives where it does. ```hint Where to start The system has two very different kinds of state with conflicting requirements. Ask yourself: which part is large, immutable, and durability-critical, and which part is small, frequently queried, and updated transactionally? Letting those two live in different storage systems — and deciding how one *references* the other — is the crux of the whole design. ``` ```hint Keep big bytes off your servers Streaming gigabytes of file data *through* your application servers is a scaling dead end. Think about how the client could transfer bytes **directly** to/from the storage layer while your service still controls authorization and metadata. What handshake guarantees a file only becomes visible *after* its bytes have safely landed? ``` ```hint Folders & sharing How will you represent a folder tree, and where do access grants live relative to that tree? The big decision is whether permissions **inherit** down the hierarchy — and if they do, what that costs you on every read and every move. Sketch the permission-check order and name the read-time cost. ``` #### What This Part Should Cover - A crisp **metadata-vs-blob separation** with a clear rationale for what lives where, and an object-key reference linking the two. - A concrete **metadata data model** (users, files/folders, versions, permissions) with the right keys and indexes for folder listing and "shared with me" / permission checks. - An **upload path** and **download path** that keep byte transfers **off** the application servers (pre-signed URLs / direct-to-object-store), with a two-phase initiate→complete handshake that exposes a file only after bytes land. - A coherent **folder model** (e.g. adjacency list) and a stated approach to **inherited permissions** — including the read-time cost and the interaction with *move*. ### Part 2 — Follow-up: very large files via chunking Files may become very large and should be split into smaller chunks for efficient, resilient transfer. Extend your Part 1 design to support **chunked file storage** while preserving file metadata and enabling efficient upload and download. Address how chunks are tracked, how a file is reconstructed, and how you keep a partially-uploaded file from ever being served. ```hint Tracking the pieces What does your metadata need to record so an ordered set of chunks can be reassembled into one file — and so several versions of the same file can coexist? Think about what uniquely identifies a chunk and what per-chunk facts you must store to verify integrity. Most object stores already expose a primitive that does the parallel-parts-then-commit dance for you. ``` ```hint Atomicity & integrity A multi-chunk upload can't be atomic at the byte layer, so where *can* you make it atomic? Find the single metadata action that flips a file from "in progress" to "servable," and gate it on a precondition. Then think about what happens to bytes from uploads that never finish. ``` #### What This Part Should Cover - **Chunk metadata** keyed by file + version (ordered index, per-chunk size, object key, and checksum) so an ordered set of parts reassembles into one file and multiple versions coexist. - **Parallel multipart upload** with independent retry of failed chunks, mapped onto the object store's native multipart primitive. - A single **atomic commit point** in metadata that flips the file to servable only after every required chunk is present and checksum-verified. - **Reassembly on download** via an ordered manifest of pre-signed chunk URLs, with per-chunk integrity verification. ### What a Strong Answer Covers These dimensions span both parts: - **Consistency & failure handling** across the byte/metadata boundary: partial uploads, blob-committed-but-metadata-failed, orphaned blobs, upload-session expiry, and integrity scrubbing. - **Scalability** for a read-heavy workload: metadata partitioning/sharding, read replicas, object-store replication/durability, and a CDN for hot downloads. - Sensible treatment of **versioning, content-hash deduplication, and reference-counted garbage collection**, and how they reuse the version/chunk model from both parts. ### Follow-up Questions - How would you support **file versioning** and let a user restore a previous version? What changes in the data model and GC? - How do you implement **real-time sync** across a user's devices (a desktop client detecting and pushing local changes) — what's the change-notification mechanism? - How would you add **server-side search** over file names and contents without slowing the write path? - How does the design change for **multi-region** users — where do metadata and blobs live, and how do you handle a user sharing a file across regions?

Quick Answer: This question evaluates system design competency, specifically understanding storage architectures, the separation of metadata versus blob/object storage, scalability and durability trade-offs, low-latency metadata access, and access-control models.

Design a cloud file storage service similar to Google Drive. Users store and retrieve files through web, desktop, and mobile clients, organize them in folders, and share them with other users.

Your design should support the following capabilities:

Upload files and download files.
Store and retrieve file metadata such as name, size, owner, creation time, last-update time, and storage location.
Organize files into folders (a folder hierarchy).
Sharing and access control — grant other users access to a file or folder at different permission levels.

A specific emphasis of this question: clearly separate the metadata store (information about files) from the blob/object storage layer (the actual file bytes), and explain why this separation matters.

Constraints & Assumptions

State your own numbers, but a reasonable working set:

Read-heavy workload: downloads/metadata-reads dominate writes (assume roughly 10:1 to 100:1 read:write).
File sizes span a wide range — from a few KB (documents) to multi-GB (videos, datasets).
Durability is paramount: a stored file must effectively never be lost (target many 9s of durability).
High availability for upload and download paths.
Metadata operations (listing a folder, checking permissions, renaming) should be low-latency (tens of ms).
Single global service; assume one logical region first, then discuss multi-region/geo.

Clarifying Questions to Ask

What is the expected scale — number of users, total files, total bytes, and upload/download QPS?
What is the maximum file size we must support, and what's the distribution (many small files vs. a few huge ones)?
Do we need file versioning / history, or is overwrite-in-place acceptable?
What consistency guarantee is required after an upload — must a download immediately reflect the just-uploaded version (read-your-writes)?
How rich must sharing be — per-user grants, shareable links, organization-wide access, inherited folder permissions?
Do we need full-text search over file names and/or contents, or just metadata listing?
Are there compliance/residency requirements (encryption at rest, regional data residency, audit logs)?

Part 1 — Core design: metadata store vs. blob store

Design the end-to-end system for the core capabilities above. Define the major components, the data model for metadata, the upload path and the download path, the folder model, and sharing / access control. Make the boundary between the metadata database and the blob/object store explicit, and justify why each kind of data lives where it does.

What This Part Should Cover

A crisp metadata-vs-blob separation with a clear rationale for what lives where, and an object-key reference linking the two.
A concrete metadata data model (users, files/folders, versions, permissions) with the right keys and indexes for folder listing and "shared with me" / permission checks.
An upload path and download path that keep byte transfers off the application servers (pre-signed URLs / direct-to-object-store), with a two-phase initiate→complete handshake that exposes a file only after bytes land.
A coherent folder model (e.g. adjacency list) and a stated approach to inherited permissions — including the read-time cost and the interaction with move .

Part 2 — Follow-up: very large files via chunking

Files may become very large and should be split into smaller chunks for efficient, resilient transfer. Extend your Part 1 design to support chunked file storage while preserving file metadata and enabling efficient upload and download. Address how chunks are tracked, how a file is reconstructed, and how you keep a partially-uploaded file from ever being served.

What This Part Should Cover

Chunk metadata keyed by file + version (ordered index, per-chunk size, object key, and checksum) so an ordered set of parts reassembles into one file and multiple versions coexist.
Parallel multipart upload with independent retry of failed chunks, mapped onto the object store's native multipart primitive.
A single atomic commit point in metadata that flips the file to servable only after every required chunk is present and checksum-verified.
Reassembly on download via an ordered manifest of pre-signed chunk URLs, with per-chunk integrity verification.

What a Strong Answer Covers

These dimensions span both parts:

Consistency & failure handling across the byte/metadata boundary: partial uploads, blob-committed-but-metadata-failed, orphaned blobs, upload-session expiry, and integrity scrubbing.
Scalability for a read-heavy workload: metadata partitioning/sharding, read replicas, object-store replication/durability, and a CDN for hot downloads.
Sensible treatment of versioning, content-hash deduplication, and reference-counted garbage collection , and how they reuse the version/chunk model from both parts.

Follow-up Questions

How would you support file versioning and let a user restore a previous version? What changes in the data model and GC?
How do you implement real-time sync across a user's devices (a desktop client detecting and pushing local changes) — what's the change-notification mechanism?
How would you add server-side search over file names and contents without slowing the write path?
How does the design change for multi-region users — where do metadata and blobs live, and how do you handle a user sharing a file across regions?

Design Cloud File Storage

Company: Harvey

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design Cloud File Storage

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Core design: metadata store vs. blob store

What This Part Should Cover

Part 2 — Follow-up: very large files via chunking

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design Cloud File Storage

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Core design: metadata store vs. blob store

What This Part Should Cover

Part 2 — Follow-up: very large files via chunking

What This Part Should Cover

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP