Design concurrent range-aware file caching client

Q: Design concurrent range-aware file caching client

This is a System Design interview question from Databricks for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: High-throughput Client-side Ranged-read File Cache

You are asked to design and specify a client-side cache that accelerates ranged reads against a remote storage service.

Context and Assumptions

The remote service supports HTTP Range-like reads: fetch(filename, offset, length).
There is also an API to download an entire file by filename and known size.
Workloads consist of many concurrent ranged reads, often overlapping or adjacent.
The cache persists on local disk across process restarts.

Requirements

Concurrency and coalescing
- Efficiently handle concurrent requests for overlapping/adjacent ranges.
- Deduplicate overlapping in-flight downloads so only one fetch per chunk occurs.
- Provide thread-safe access and correctness under concurrency.
Chunking, request strategy, and prefetch
- Choose chunk size and justify trade-offs.
- Coalesce requests to reduce remote calls and over-fetch sensibly.
- Implement prefetch/read-ahead strategies for sequential access.
Backpressure and rate limiting
- Limit remote QPS/throughput and disk I/O concurrency.
- Disable prefetch under load; bound queues.
Cache management
- Cache indexing and lookup.
- Eviction policy (e.g., LRU) with byte budget.
- Persistence across restarts.
- Validation and invalidation if the remote file changes.
Reliability
- Handle partial failures, retries, and timeouts.
- Resume or refetch partial chunks.
API semantics
- Define read(filename, offset, length) (sync or async), and streaming semantics if applicable.
- Define write policy (e.g., read-through only, or write-through with invalidation).
Deliverables
- Describe key data structures and on-disk layout.
- Provide pseudocode for coordinating fetch, cache lookup, coalescing, and concurrency control.
- Analyze scalability, correctness under concurrency, and time/space complexity.

Design concurrent range-aware file caching client

System Design: High-throughput Client-side Ranged-read File Cache

Context and Assumptions

Requirements

Solution (Locked)

Comments (0)