Design distributed log storage service

Q: Design distributed log storage service

This question evaluates a candidate's expertise in designing distributed storage systems, covering partitioned append-only logs, high-throughput writes and reads, replication, retention, indexing, compaction, leader/follower roles, and recovery mechanisms.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design a Distributed Append-Only Log Storage System

You are asked to design the storage layer of a distributed, partitioned, replicated append-only log service that supports:

Append-only writes with high throughput
Retention policies (by time and/or size)
Partitioning for horizontal scalability
Replication for fault tolerance
High-throughput sequential reads

Detail the following components and behaviors:

Segment management: active vs. sealed segments, rollover conditions, deletion, and preallocation.
Compaction: when and how to compact, tombstones/deletes, write amplification tradeoffs.
Indexing: offset-to-file-position index, time index, sparsity, and rebuild.
Leader/follower roles: write and read paths, quorum/ack semantics, high watermark, lag handling.
Recovery: crash recovery, index rebuild, truncation, leader election safety, and follower catch-up.

Assume a large-scale multi-node deployment with commodity disks and network, and that clients produce and consume records identified by monotonically increasing offsets within each partition.

Design distributed log storage service

Quick Overview

Design a Distributed Append-Only Log Storage System

Solution

Comments (0)