Design a production file storage service
Company: Harvey AI
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
Design a production-grade file storage service that supports the following semantics and constraints: APIs—addFile(String path) creates any missing folders and stores a file at a path like "path/to/somewhere/file.txt"; list(String path) returns the immediate children of the directory at the given path. Constraints—each directory can contain at most 5 entries (files + folders); attempts to exceed this limit must be atomically rejected; duplicate file names in the same directory are auto-renamed using OS-style suffixes (e.g., base
(
1).ext, base
(
2).ext), including handling inputs that already contain such suffixes. Describe: overall architecture (API layer, metadata service, content store), metadata schema and store choice (relational vs NoSQL), how you enforce the per-directory capacity limit and renaming atomically under concurrent requests, transaction boundaries and idempotency, content-addressed vs location-addressed storage and how file bytes are stored (e.g., object storage references), handling large files (streaming/resumable uploads), consistency model and failure/rollback, scalability (partitioning keys, sharding, caching), observability and rate/quota enforcement, data lifecycle (retention, deletion, versioning), and security (authn/authz, path traversal protection, encryption in transit/at rest). Define key SLIs/SLOs.
Quick Answer: This question evaluates a candidate's competency in distributed systems and storage architecture, covering metadata schema design, concurrency control for atomic per-directory limits and renaming, transaction boundaries and idempotency, storage models for file bytes, large-file handling, consistency and failure modes, scalability and partitioning, observability, lifecycle management, and security. It is in the System Design domain and is commonly asked because it reveals how a candidate reasons about architectural trade-offs and practical implementation concerns; it tests both high-level conceptual design and hands-on practical application.