Design highly available blob storage service
Company: Pinterest
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
Design a large-scale, highly available blob storage service similar to Amazon S3. The service should allow clients to store, retrieve, and delete arbitrarily sized binary objects (blobs) identified by keys.
Clarify and handle at least the following aspects in your design:
- Functional APIs (for example: create bucket, put object, get object, delete object, list objects in a bucket).
- Non-functional requirements: scalability to billions of objects, high availability, very high durability, and reasonable latency.
- Data model for buckets, objects, and metadata.
- Overall high-level architecture (front-end/API layer, metadata management, storage nodes, background services).
- How you partition and route traffic and data so the system can scale horizontally.
- How you achieve durability and fault tolerance (for example, replication or erasure coding across machines and data centers, handling node or rack failures, background repair).
- Consistency model (for example, eventual vs stronger guarantees) and how reads and writes flow through the system.
- How uploads and downloads work for very large objects (for example, multipart upload, range reads).
- Security and access control at a high level (authentication, authorization, encryption in transit and at rest).
- Monitoring, metrics, and capacity planning considerations.
Walk through the main data flows (a typical write/PUT and a typical read/GET), explain key trade-offs in your design, and justify the choices you make.
Quick Answer: This question evaluates a candidate's competency in designing scalable, durable, and highly available blob/object storage systems, encompassing distributed systems architecture, data modeling for buckets and objects, metadata management, partitioning and routing, fault tolerance strategies, consistency models, large-object transfer patterns, security controls, and operational concerns like monitoring and capacity planning. Commonly asked in the System Design domain to assess practical application of architectural patterns and trade-offs—horizontal scalability, durability versus latency, consistency guarantees, and operational resilience—it primarily tests practical system-design skills grounded in conceptual understanding of distributed systems.