Design an object store with deduplication
Company: Snowflake
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
## System Design Prompt
Design a simplified cloud object storage service (similar to an object store) that allows users to upload and download files.
The key focus is **reducing cost by avoiding storing duplicate files** (deduplication).
### Core Requirements
- Upload a file (binary blob) and store it durably.
- Download a previously uploaded file by an identifier.
- Support basic metadata (filename, content-type, size, upload time, owner).
- Avoid saving duplicates: if multiple users upload the exact same content, store it only once.
### Non-Goals (to keep it simpler)
- No complex lifecycle rules, tiering, or CDN features required.
- Permissions can be simplified (assume authenticated users, simple ACL).
### Scale / Assumptions (you may choose reasonable numbers)
- Large number of objects; objects can be large (up to multiple GB).
- High read/write throughput.
- Must be resilient to failures.
### What to Cover
- APIs
- High-level architecture
- Data model and storage layout
- Deduplication approach (how to detect duplicates, how to reference shared content)
- Consistency and correctness concerns (races, partial uploads)
- Deletion / garbage collection (when shared content can be removed)
- Operational considerations (monitoring, cost trade-offs)
Quick Answer: This question evaluates a candidate's ability to design scalable, durable object storage systems with deduplication, touching on storage architecture, data modeling, metadata management, consistency, and garbage collection.