You are asked to design a blob storage system deployed on a lunar base. The interviewer is intentionally making the environment unusual (the Moon) to test your ability to reason from first principles.
Assume the system should:
-
Store large binary objects ("blobs"), such as scientific data files and logs, each from a few KB up to several TB.
-
Support basic operations:
-
PUT
(upload a blob)
-
GET
(download a blob)
-
DELETE
(remove a blob)
-
Provide durability and availability
on the Moon
, even with:
-
Limited hardware (a small cluster of storage nodes).
-
High-latency, intermittent links between Earth and Moon.
-
Handle reads and writes primarily from
clients on the Moon
, with occasional replication to Earth for backup.
Design the system. In your answer, cover:
-
High-level architecture
-
Components (clients, API gateway, metadata service, storage nodes, background replication, etc.).
-
How clients interact with the system for uploads/downloads.
-
Data model and metadata
-
How blobs are identified (IDs, paths, or both).
-
What metadata is stored (size, checksums, versioning, timestamps, ACLs).
-
How metadata is stored and made highly available on the Moon.
-
Storage layout and durability
-
How blobs are split into chunks or stored as whole objects.
-
How you achieve durability with a small number of nodes (e.g., replication factor, erasure coding).
-
Trade-offs between replication vs. erasure coding in a resource-constrained environment.
-
Consistency and replication model
-
Consistency guarantees for clients on the Moon (e.g., read-after-write for a single blob).
-
How and when data is replicated to Earth, given high latency and intermittent connectivity.
-
What to do if Earth is unreachable for long periods.
-
Fault tolerance and operations
-
Handling node failures on the Moon.
-
Detecting corruption (e.g., via checksums) and repairing data.
-
Backup and disaster recovery strategy, including how Earth copies are used.
-
APIs and security
-
Basic REST-like APIs for blob operations.
-
Authentication and authorization assumptions (e.g., per-project or per-user access control).
Make reasonable assumptions where needed (e.g., expected capacity, QPS, number of nodes) and state them explicitly. Focus on clearly explaining your design choices and trade-offs in the context of a constrained lunar environment with unreliable Earth connectivity.