Design a cloud database write path and recovery

Q: Design a cloud database write path and recovery

This is a System Design interview question from Amazon for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

System Design (Engine-level): Write Path + Crash Recovery

Design a core subsystem for a cloud-native relational database (Aurora-like) where compute is separated from durable distributed storage.

Goal

Support transactional writes with:

high throughput
low commit latency
crash recovery
strong durability guarantees (clearly specify what guarantees)

Requirements / prompts

Write path : Describe how an UPDATE/INSERT flows from compute to durable storage. Where do you place the log (WAL)?
Commit protocol : When does a transaction commit succeed? What acknowledgements are required?
Replication & consistency : How many replicas, what quorum rules, and how do you handle network partitions?
Crash recovery : If the compute node crashes, how does a new node recover state and resume service? What data structures/checkpoints exist?
Write amplification : Identify sources (WAL, page rewrites, compaction) and propose reductions.
Scalability : How do you scale storage and compute independently? Discuss sharding, rebalancing, and hotspot handling.
Observability : What metrics and logs would you add to detect replication lag, redo backlog, and tail latency?

Design a cloud database write path and recovery

System Design (Engine-level): Write Path + Crash Recovery

Goal

Requirements / prompts

Comments (0)