Design concurrent range-aware file caching client
Company: Databricks
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: HR Screen
Design a high-throughput client-side file cache that serves ranged reads from a remote storage service. The remote API supports fetching by filename with byte offset and length; you also have a method that downloads an entire file by filename and size. Requirements: support concurrent requests for overlapping/adjacent ranges efficiently; choose chunk size, request coalescing, and prefetching strategies; ensure thread-safe access, deduplicate overlapping in-flight downloads, and provide backpressure/rate limiting; implement cache indexing, eviction (e.g., LRU), persistence, and validation if the remote file changes; handle partial failures, retries, and timeouts; define and justify API semantics (e.g., read(filename, offset, length) and write policy). Provide key data structures and pseudocode for coordinating fetch, cache lookup, and concurrency control. Analyze scalability, correctness under concurrency, and time/space complexity.
Quick Answer: This question evaluates understanding of concurrent, range-aware client-side file caching—covering concurrency control and deduplication of in-flight requests, chunking and prefetch strategies, persistence, eviction policies, validation, and reliability—and is categorized under System Design.