Design cache for DAG-based query views

Q: Design cache for DAG-based query views

This question evaluates a candidate's understanding of caching, consistency, versioning, eviction, and update propagation in DAG-based materialized views, testing competencies in distributed systems, data engineering, and storage-hierarchy design.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: Caching Strategy for a DAG of Materialized Views

Context

You are designing an analytics system that computes materialized query views. Views depend on other views (and base tables), forming a directed acyclic graph (DAG). The goal is to reduce query latency while maintaining correctness and controllable freshness.

Assume:

Base data lives in durable storage and is updated in micro-batches or streaming (CDC).
Queries commonly read recent time windows and hot dimensions.
The system is multi-tenant and runs on a fleet of compute nodes with local memory and disk; a shared distributed cache is available.

Task

Design a caching strategy for this system. Specify:

What to cache
- Choose cache granularity among: base tables (blocks/columns), partial aggregates (intermediate DAG nodes), and full view results. Explain trade-offs.
Where to cache
- In-memory (per-node), local SSD, and/or distributed cache. Propose a multi-tier policy.
Cache keys and versioning
- Define keys so cached entries are uniquely and correctly identified. Include versioning that ties entries to specific input data snapshots.
Eviction policy
- Specify the algorithm(s), admission control, quotas, and safeguards against thrash.
Freshness SLAs
- Define staleness guarantees (e.g., strong vs bounded) and how queries select acceptable cached versions.
Consistency and invalidation on updates
- How caches are invalidated/updated when base data changes. Describe mechanisms to avoid stale or inconsistent joins across views.
Update propagation through the DAG
- Explain incremental vs full recomputation, ordering, and how lineage is tracked.
Failure handling
- Partial failures, retries, circuit breakers, and fallback behavior.
Backfills
- Strategy for large historical recomputations without disrupting hot-path queries.
Hot keys and load shedding
- Handling hotspots, thundering herds, and skew.
Monitoring and validation
- Metrics and techniques to verify correctness and performance.

Provide a step-by-step, implementation-oriented design with examples where helpful.

Design cache for DAG-based query views

System Design: Caching Strategy for a DAG of Materialized Views

Context

Task

Solution

Comments (0)

Design cache for DAG-based query views

Overview

System Design: Caching Strategy for a DAG of Materialized Views

Context

Task

Solution

Comments (0)