Design a Cloud File Storage and Sync Service (Google Drive)
Company: Omnissa
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
## Design a Cloud File Storage and Sync Service (Google Drive)
Design a cloud file-storage and synchronization service similar to Google Drive
or Dropbox. Users can upload files of arbitrary type and size from a web app, a
desktop client, and a mobile app; access those files from any device; keep a
local folder automatically synchronized with the cloud; and share files or
folders with other users. Your task is to design the end-to-end system: how bytes
and metadata are stored, the upload/download paths, multi-device synchronization,
and sharing/permissions — all at scale and without losing user data.
### Constraints & Assumptions
- Roughly 100M registered users, ~10M daily active users.
- Average user stores ~50 GB. Individual files range from a few KB to several GB;
assume a single-file upload cap of 5 GB.
- Workload is read-heavy but with a substantial write/upload volume; uploads can
be large and long-running.
- The desktop client keeps a watched folder in near-real-time sync (a change on
one device should appear on another within a few seconds while both are online).
- **Durability is paramount**: no data loss on a single host/disk/region failure.
Target availability ~99.9%.
- Files are mutable; users expect version history and the ability to restore a
previous version.
- Most access is to a user's own files; sharing is a smaller fraction of traffic.
### Clarifying Questions to Ask
- What is the hard cap on single-file size and on per-user quota that we must
support?
- Do we need version history and "restore previous version," and if so for how
long do we retain old versions?
- After an edit on device A, how fresh must device B be — a few seconds, or is
minutes-level eventual consistency acceptable?
- Is end-to-end (client-side) encryption a requirement, or is server-side
encryption at rest sufficient?
- Do we need real-time collaborative editing of file contents (Google Docs
style), or only file-level sync where the unit of change is a whole file?
- What is the rough read:write ratio and the geographic distribution of users
(single region vs. global)?
### Part 1 — Storage and the Upload/Download Path
Design how a file's raw bytes and its metadata are stored, and the upload and
download flows. Explicitly handle large files, deduplication of identical
content, and how a client resumes an upload that was interrupted partway through.
```hint Chunking
Don't store a file as one opaque blob. Split each file into fixed-size blocks
(e.g., 4 MB) and represent the file as an ordered list of block references. Think
about what that buys you for resume, dedup, and changing only part of a large
file.
```
```hint Two stores
Separate the *content* store from the *metadata* store. Bytes belong in object
storage / a block store; the file tree, names, sizes, and block lists belong in a
database. Decide what each one holds and how clients reach the bytes without
streaming them through your app servers.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Multi-Device Synchronization
A user runs the desktop client on two laptops. Design how a change made on one
device propagates to the other within seconds: how the client detects local
changes, how it learns about remote changes, and how you resolve conflicting
edits made on two devices.
```hint Change feed
Give each user (or each "drive") a monotonically increasing version cursor. Every
change appends to a per-user journal. A client remembers its last-seen cursor and
asks "what changed since X?" — then decide how the server pushes new changes
instead of making clients poll constantly.
```
```hint Conflicts
When two devices edit the same file while offline, you generally cannot safely
auto-merge arbitrary binary bytes. Think about a non-destructive resolution that
never silently throws away a user's edit.
```
#### Clarifying Questions for this Part
- When the same file is edited offline on two devices, is automatically keeping
both versions (a "conflicted copy") acceptable, or does the product require an
attempt at content-level merge?
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Sharing, Permissions, and Scale/Reliability
Extend the design so a user can share a single file or an entire folder with
specific other users or via a link, with view-only or edit permissions. Then
discuss how you scale the metadata store and keep the whole system durable and
available.
```hint Permissions model
Attach an access-control list to a node in the file tree. For a folder share,
work out how a grant on the folder applies to all descendants *without* rewriting
a permission row on every child file.
```
```hint Sharding
The metadata DB is the bottleneck, not the bytes. Decide a shard key (hint: think
about what keeps one user's tree and one folder subtree cheap to query) and how
an item shared across two users spans shards.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- Given content-addressed blocks, how would you support version history and
"restore a previous version" cheaply in both storage and bookkeeping?
- A 5 GB upload fails at 90%. Walk through exactly what state exists on the client
and server, and how the client resumes without re-sending completed blocks.
- How do you garbage-collect blocks no longer referenced by any file or version,
safely, while new uploads may be referencing those same blocks concurrently?
- How would you add server-side full-text search across a user's files without
scanning object storage on every query?
Quick Answer: This question asks for the end-to-end design of a cloud file storage and sync service, evaluating a candidate's grasp of distributed systems fundamentals such as metadata versus content storage, chunking, and deduplication. It is a common system design interview question used to assess reasoning about multi-device synchronization, conflict resolution, sharing permissions, and durability at scale, testing practical architectural application rather than pure theory.