PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Harvey AI

Design a production file storage service

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in distributed systems and storage architecture, covering metadata schema design, concurrency control for atomic per-directory limits and renaming, transaction boundaries and idempotency, storage models for file bytes, large-file handling, consistency and failure modes, scalability and partitioning, observability, lifecycle management, and security. It is in the System Design domain and is commonly asked because it reveals how a candidate reasons about architectural trade-offs and practical implementation concerns; it tests both high-level conceptual design and hands-on practical application.

  • hard
  • Harvey AI
  • System Design
  • Software Engineer

Design a production file storage service

Company: Harvey AI

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design a production-grade file storage service that supports the following semantics and constraints: APIs—addFile(String path) creates any missing folders and stores a file at a path like "path/to/somewhere/file.txt"; list(String path) returns the immediate children of the directory at the given path. Constraints—each directory can contain at most 5 entries (files + folders); attempts to exceed this limit must be atomically rejected; duplicate file names in the same directory are auto-renamed using OS-style suffixes (e.g., base ( 1).ext, base ( 2).ext), including handling inputs that already contain such suffixes. Describe: overall architecture (API layer, metadata service, content store), metadata schema and store choice (relational vs NoSQL), how you enforce the per-directory capacity limit and renaming atomically under concurrent requests, transaction boundaries and idempotency, content-addressed vs location-addressed storage and how file bytes are stored (e.g., object storage references), handling large files (streaming/resumable uploads), consistency model and failure/rollback, scalability (partitioning keys, sharding, caching), observability and rate/quota enforcement, data lifecycle (retention, deletion, versioning), and security (authn/authz, path traversal protection, encryption in transit/at rest). Define key SLIs/SLOs.

Quick Answer: This question evaluates a candidate's competency in distributed systems and storage architecture, covering metadata schema design, concurrency control for atomic per-directory limits and renaming, transaction boundaries and idempotency, storage models for file bytes, large-file handling, consistency and failure modes, scalability and partitioning, observability, lifecycle management, and security. It is in the System Design domain and is commonly asked because it reveals how a candidate reasons about architectural trade-offs and practical implementation concerns; it tests both high-level conceptual design and hands-on practical application.

Related Interview Questions

  • Determine identical files ignoring metadata - Harvey AI (hard)
Harvey AI logo
Harvey AI
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
49
0

System Design: Production-Grade File Storage Service

Problem

Design a production-grade file storage service with the following APIs, semantics, and constraints.

APIs

  • addFile(String path)
    • Creates any missing folders and stores a file at a path like "path/to/somewhere/file.txt".
  • list(String path)
    • Returns the immediate children (files and folders) of the directory at the given path.

Constraints and Semantics

  1. Per-directory capacity limit: each directory can contain at most 5 entries (files + folders). Attempts to exceed this limit must be atomically rejected.
  2. Name collision handling: duplicate file names in the same directory are auto-renamed using OS-style suffixes, e.g., base (1).ext, base (2).ext. Inputs that already contain such suffixes must be handled correctly (do not double-suffix; accept if free; otherwise continue numbering).

What to Describe

  • Overall architecture: API layer, metadata service, content store.
  • Metadata schema and store choice (relational vs NoSQL).
  • How to enforce the per-directory capacity limit and renaming atomically under concurrent requests.
  • Transaction boundaries and idempotency.
  • Content-addressed vs location-addressed storage and how file bytes are stored (e.g., object storage references).
  • Handling large files (streaming/resumable uploads).
  • Consistency model and failure/rollback.
  • Scalability (partitioning keys, sharding, caching).
  • Observability and rate/quota enforcement.
  • Data lifecycle (retention, deletion, versioning).
  • Security (authn/authz, path traversal protection, encryption in transit/at rest).
  • Define key SLIs/SLOs.

Assume a single logical namespace with a root directory and multi-tenant users. You may add minimal endpoints (e.g., upload sessions) to make large-file handling realistic.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Harvey AI•More Software Engineer•Harvey AI Software Engineer•Harvey AI System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.