How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Amazon during technical interviews.

Design a replicated cloud storage service | Amazon Interview Question

Q: Design a replicated cloud storage service

This question evaluates a candidate's system-design competency in distributed cloud storage, covering metadata/data separation, replication models, read/write paths, failure recovery, and operational trade-offs.

Design the internals of a cloud storage service (object/blob storage). Focus on storage/infra concerns rather than end-user features.

Cover the following:

High-level architecture
- Separate data plane (serving reads/writes of blobs) vs control/metadata plane (namespaces, object locations, versions, ACLs).
- Key services/components you would expect (frontends, metadata service, storage nodes, background repair, monitoring).
Metadata vs data relationship
- What metadata is stored (object ID, size, checksums, replication state, versioning, location pointers).
- How metadata points to data chunks/segments and how you avoid metadata becoming a bottleneck.
Replication model
- Choose a replication approach (e.g., primary/secondary, quorum replication, chain replication, erasure coding) and justify it.
- Define durability and availability goals (e.g., tolerate N failures) and what “commit” means.
Write path and read path
- Step-by-step request flow for a PUT/WRITE and GET/READ.
- When you acknowledge a write to the client.
- Caching and hot-object optimizations (optional).
Trade-offs
- How you balance durability vs performance (sync vs async replication, quorum size, batching).
- Consistency choices (strong/eventual) and how clients observe them.
Failure handling and recovery
- Node failure detection, re-replication/reconstruction, data scrubbing, and recovery workflow.
- What happens during partial failures (e.g., one replica slow, metadata unavailable).

Design the internals of a cloud storage service (object/blob storage). Focus on storage/infra concerns rather than end-user features.

Cover the following:

High-level architecture
- Separate data plane (serving reads/writes of blobs) vs control/metadata plane (namespaces, object locations, versions, ACLs).
- Key services/components you would expect (frontends, metadata service, storage nodes, background repair, monitoring).
Metadata vs data relationship
- What metadata is stored (object ID, size, checksums, replication state, versioning, location pointers).
- How metadata points to data chunks/segments and how you avoid metadata becoming a bottleneck.
Replication model
- Choose a replication approach (e.g., primary/secondary, quorum replication, chain replication, erasure coding) and justify it.
- Define durability and availability goals (e.g., tolerate N failures) and what “commit” means.
Write path and read path
- Step-by-step request flow for a PUT/WRITE and GET/READ.
- When you acknowledge a write to the client.
- Caching and hot-object optimizations (optional).
Trade-offs
- How you balance durability vs performance (sync vs async replication, quorum size, batching).
- Consistency choices (strong/eventual) and how clients observe them.
Failure handling and recovery
- Node failure detection, re-replication/reconstruction, data scrubbing, and recovery workflow.
- What happens during partial failures (e.g., one replica slow, metadata unavailable).

Design a replicated cloud storage service

Quick Overview

Design a replicated cloud storage service

Submit Your Answer to Earn 20XP

Design a replicated cloud storage service

Quick Overview

Design a replicated cloud storage service

Submit Your Answer to Earn 20XP