Design a Caching Strategy for a DAG of Computed Views
Context
You operate a cloud query engine where logical views are defined over base tables and/or other views. These definitions form a Directed Acyclic Graph (DAG): leaves are base tables; internal nodes are computed views (e.g., projections, filters, joins, aggregations). Queries typically target one or more root views.
Assume:
-
MVCC snapshots for reads; commits produce new versions of base table partitions.
-
Compute is elastic/stateless; data lives in remote object storage; metadata service tracks lineage.
-
Change data (per-partition/table) is available to support incremental maintenance.
Task
Design a caching strategy to reduce query latency. Address the following:
-
What to cache
-
Whole query results vs. partial subplans (intermediate/materialized views).
-
Where to place caches
-
Client/session, query service/control plane, or storage/compute nodes.
-
Update/invalidation
-
How caches are updated/invalidated when upstream nodes change.
-
Consistency & staleness
-
Guarantees (strong/snapshot vs. bounded-staleness), versioning, TTLs.
-
Maintenance
-
Incremental/materialized view maintenance and invalidation triggers.
-
Admission/eviction
-
Cost-based policies for admitting/evicting hot subgraphs.
Discuss design trade-offs and include mechanisms such as versioning keys, triggers, and cost models.