Design a Distributed Append-Only Log Storage System
You are asked to design the storage layer of a distributed, partitioned, replicated append-only log service that supports:
-
Append-only writes with high throughput
-
Retention policies (by time and/or size)
-
Partitioning for horizontal scalability
-
Replication for fault tolerance
-
High-throughput sequential reads
Detail the following components and behaviors:
-
Segment management: active vs. sealed segments, rollover conditions, deletion, and preallocation.
-
Compaction: when and how to compact, tombstones/deletes, write amplification tradeoffs.
-
Indexing: offset-to-file-position index, time index, sparsity, and rebuild.
-
Leader/follower roles: write and read paths, quorum/ack semantics, high watermark, lag handling.
-
Recovery: crash recovery, index rebuild, truncation, leader election safety, and follower catch-up.
Assume a large-scale multi-node deployment with commodity disks and network, and that clients produce and consume records identified by monotonically increasing offsets within each partition.