PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/System Design/Databricks

Design concurrent range-aware file caching client

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of concurrent, range-aware client-side file caching—covering concurrency control and deduplication of in-flight requests, chunking and prefetch strategies, persistence, eviction policies, validation, and reliability—and is categorized under System Design.

  • hard
  • Databricks
  • System Design
  • Software Engineer

Design concurrent range-aware file caching client

Company: Databricks

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: HR Screen

Design a high-throughput client-side file cache that serves ranged reads from a remote storage service. The remote API supports fetching by filename with byte offset and length; you also have a method that downloads an entire file by filename and size. Requirements: support concurrent requests for overlapping/adjacent ranges efficiently; choose chunk size, request coalescing, and prefetching strategies; ensure thread-safe access, deduplicate overlapping in-flight downloads, and provide backpressure/rate limiting; implement cache indexing, eviction (e.g., LRU), persistence, and validation if the remote file changes; handle partial failures, retries, and timeouts; define and justify API semantics (e.g., read(filename, offset, length) and write policy). Provide key data structures and pseudocode for coordinating fetch, cache lookup, and concurrency control. Analyze scalability, correctness under concurrency, and time/space complexity.

Quick Answer: This question evaluates understanding of concurrent, range-aware client-side file caching—covering concurrency control and deduplication of in-flight requests, chunking and prefetch strategies, persistence, eviction policies, validation, and reliability—and is categorized under System Design.

Related Interview Questions

  • Design a Book Price Aggregator - Databricks (medium)
  • Design a stock order manager - Databricks (medium)
  • Design an Online Bookstore - Databricks (hard)
  • Design a Hierarchical File System - Databricks (hard)
  • Design a Visa-like payment processing system - Databricks (hard)
Databricks logo
Databricks
Aug 9, 2025, 12:00 AM
Software Engineer
HR Screen
System Design
15
0

System Design: High-throughput Client-side Ranged-read File Cache

You are asked to design and specify a client-side cache that accelerates ranged reads against a remote storage service.

Context and Assumptions

  • The remote service supports HTTP Range-like reads: fetch(filename, offset, length).
  • There is also an API to download an entire file by filename and known size.
  • Workloads consist of many concurrent ranged reads, often overlapping or adjacent.
  • The cache persists on local disk across process restarts.

Requirements

  1. Concurrency and coalescing
    • Efficiently handle concurrent requests for overlapping/adjacent ranges.
    • Deduplicate overlapping in-flight downloads so only one fetch per chunk occurs.
    • Provide thread-safe access and correctness under concurrency.
  2. Chunking, request strategy, and prefetch
    • Choose chunk size and justify trade-offs.
    • Coalesce requests to reduce remote calls and over-fetch sensibly.
    • Implement prefetch/read-ahead strategies for sequential access.
  3. Backpressure and rate limiting
    • Limit remote QPS/throughput and disk I/O concurrency.
    • Disable prefetch under load; bound queues.
  4. Cache management
    • Cache indexing and lookup.
    • Eviction policy (e.g., LRU) with byte budget.
    • Persistence across restarts.
    • Validation and invalidation if the remote file changes.
  5. Reliability
    • Handle partial failures, retries, and timeouts.
    • Resume or refetch partial chunks.
  6. API semantics
    • Define read(filename, offset, length) (sync or async), and streaming semantics if applicable.
    • Define write policy (e.g., read-through only, or write-through with invalidation).
  7. Deliverables
    • Describe key data structures and on-disk layout.
    • Provide pseudocode for coordinating fetch, cache lookup, coalescing, and concurrency control.
    • Analyze scalability, correctness under concurrency, and time/space complexity.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Databricks•More Software Engineer•Databricks Software Engineer•Databricks System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.