Design cache for DAG-based query views

Q: Design cache for DAG-based query views

This is a System Design interview question from Snowflake for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design a Caching Strategy for a DAG of Computed Views

Context

You operate a cloud query engine where logical views are defined over base tables and/or other views. These definitions form a Directed Acyclic Graph (DAG): leaves are base tables; internal nodes are computed views (e.g., projections, filters, joins, aggregations). Queries typically target one or more root views.

Assume:

MVCC snapshots for reads; commits produce new versions of base table partitions.
Compute is elastic/stateless; data lives in remote object storage; metadata service tracks lineage.
Change data (per-partition/table) is available to support incremental maintenance.

Task

Design a caching strategy to reduce query latency. Address the following:

What to cache
- Whole query results vs. partial subplans (intermediate/materialized views).
Where to place caches
- Client/session, query service/control plane, or storage/compute nodes.
Update/invalidation
- How caches are updated/invalidated when upstream nodes change.
Consistency & staleness
- Guarantees (strong/snapshot vs. bounded-staleness), versioning, TTLs.
Maintenance
- Incremental/materialized view maintenance and invalidation triggers.
Admission/eviction
- Cost-based policies for admitting/evicting hot subgraphs.

Discuss design trade-offs and include mechanisms such as versioning keys, triggers, and cost models.

Design cache for DAG-based query views

Design a Caching Strategy for a DAG of Computed Views

Context

Task

Solution (Locked)

Comments (0)