Design a Distributed Message Queue for High Throughput and Large Payloads
Context
You are asked to design a distributed message queue system that supports very high throughput for producers and consumers and can handle large message payloads. The system should support both a simple in-process queue (for a single application) and a multi-tenant, cloud-hosted service.
Requirements
-
APIs and Resource Model
-
Topics/queues: create, update, delete, list.
-
Produce and consume semantics: batching, acknowledgments, consumer groups, ordering, and offset management.
-
Deployment/administration: provisioning, scaling, health, metrics, and upgrades.
-
Reliability and Recovery
-
Strategies to prevent data loss and handle failures: acknowledgments, retries, backoff, dead-letter queues (DLQ), deduplication, and recovery from broker or consumer failures.
-
Architecture and Trade-offs
-
Contrast a single-process, in-memory queue class + API against a multi-tenant cloud service.
-
Storage choices for large payloads; partitioning and sharding strategy; replication and placement; resource isolation across tenants.
-
Performance
-
Consider throughput, latency, batching, flow control/backpressure, and scaling strategies.
Provide a clear design, API sketches, and rationale for key choices.