System Design: Caching Strategy for a DAG of Materialized Views
Context
You are designing an analytics system that computes materialized query views. Views depend on other views (and base tables), forming a directed acyclic graph (DAG). The goal is to reduce query latency while maintaining correctness and controllable freshness.
Assume:
-
Base data lives in durable storage and is updated in micro-batches or streaming (CDC).
-
Queries commonly read recent time windows and hot dimensions.
-
The system is multi-tenant and runs on a fleet of compute nodes with local memory and disk; a shared distributed cache is available.
Task
Design a caching strategy for this system. Specify:
-
What to cache
-
Choose cache granularity among: base tables (blocks/columns), partial aggregates (intermediate DAG nodes), and full view results. Explain trade-offs.
-
Where to cache
-
In-memory (per-node), local SSD, and/or distributed cache. Propose a multi-tier policy.
-
Cache keys and versioning
-
Define keys so cached entries are uniquely and correctly identified. Include versioning that ties entries to specific input data snapshots.
-
Eviction policy
-
Specify the algorithm(s), admission control, quotas, and safeguards against thrash.
-
Freshness SLAs
-
Define staleness guarantees (e.g., strong vs bounded) and how queries select acceptable cached versions.
-
Consistency and invalidation on updates
-
How caches are invalidated/updated when base data changes. Describe mechanisms to avoid stale or inconsistent joins across views.
-
Update propagation through the DAG
-
Explain incremental vs full recomputation, ordering, and how lineage is tracked.
-
Failure handling
-
Partial failures, retries, circuit breakers, and fallback behavior.
-
Backfills
-
Strategy for large historical recomputations without disrupting hot-path queries.
-
Hot keys and load shedding
-
Handling hotspots, thundering herds, and skew.
-
Monitoring and validation
-
Metrics and techniques to verify correctness and performance.
Provide a step-by-step, implementation-oriented design with examples where helpful.