Design a blob storage system for lunar environment
Company: Coinbase
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
You are asked to design a **blob storage system deployed on a lunar base**. The interviewer is intentionally making the environment unusual (the Moon) to test your ability to reason from first principles.
Assume the system should:
- Store large binary objects ("blobs"), such as scientific data files and logs, each from a few KB up to several TB.
- Support basic operations:
- `PUT` (upload a blob)
- `GET` (download a blob)
- `DELETE` (remove a blob)
- Provide durability and availability **on the Moon**, even with:
- Limited hardware (a small cluster of storage nodes).
- High-latency, intermittent links between Earth and Moon.
- Handle reads and writes primarily from **clients on the Moon**, with occasional replication to Earth for backup.
Design the system. In your answer, cover:
1. **High-level architecture**
- Components (clients, API gateway, metadata service, storage nodes, background replication, etc.).
- How clients interact with the system for uploads/downloads.
2. **Data model and metadata**
- How blobs are identified (IDs, paths, or both).
- What metadata is stored (size, checksums, versioning, timestamps, ACLs).
- How metadata is stored and made highly available on the Moon.
3. **Storage layout and durability**
- How blobs are split into chunks or stored as whole objects.
- How you achieve durability with a small number of nodes (e.g., replication factor, erasure coding).
- Trade-offs between replication vs. erasure coding in a resource-constrained environment.
4. **Consistency and replication model**
- Consistency guarantees for clients on the Moon (e.g., read-after-write for a single blob).
- How and when data is replicated to Earth, given high latency and intermittent connectivity.
- What to do if Earth is unreachable for long periods.
5. **Fault tolerance and operations**
- Handling node failures on the Moon.
- Detecting corruption (e.g., via checksums) and repairing data.
- Backup and disaster recovery strategy, including how Earth copies are used.
6. **APIs and security**
- Basic REST-like APIs for blob operations.
- Authentication and authorization assumptions (e.g., per-project or per-user access control).
Make reasonable assumptions where needed (e.g., expected capacity, QPS, number of nodes) and state them explicitly. Focus on clearly explaining your design choices and trade-offs in the context of a constrained lunar environment with unreliable Earth connectivity.
Quick Answer: This question evaluates system design and distributed storage competencies, focusing on durability, replication, metadata management, consistency, and trade-offs under resource-constrained, high-latency conditions; it is in the System Design category and tests knowledge across distributed systems, storage architecture, and networking.