Database Internals Deep Dive
You are interviewing for a cloud database engine team (Aurora-like). Answer conceptually and with trade-offs.
Questions
-
Architecture
: How does a cloud-native database architecture (separating compute from distributed storage) differ from a traditional single-node database using local disks? What are the benefits and costs?
-
Replication model
: Compare replication approaches:
-
physical vs logical replication
-
statement-based vs row-based vs log-based
-
synchronous vs asynchronous
-
quorum-based durability in distributed storage
When would you choose each, and what failure modes do they address?
-
WAL / redo log
: What is the role of WAL (write-ahead logging) in durability and crash recovery? How do redo/undo relate to WAL? What must be forced to durable storage on commit?
-
Crash recovery
: Walk through a typical recovery procedure after a crash (phases, checkpoints, replay). What changes when storage is distributed?
-
MVCC
: Explain MVCC fundamentals:
-
how reads avoid blocking writes
-
snapshot visibility rules (high level)
-
what metadata is stored per row/version
-
High-concurrency writes
: Under heavy write load, how do lock conflicts happen and how can an engine reduce contention (e.g., latches vs locks, optimistic CC, partitioning/hotspot mitigation)?
-
Trade-offs
: Discuss consistency vs latency vs throughput trade-offs in replication and commit paths (e.g., quorum write, 1-RTT vs 2-RTT commit, read-your-writes, failover behavior).