System Design: Migrate From Relational DB to a NoSQL Key-Value Store with Snapshot + CDC
Context
You need to migrate data from an OLTP relational database to a NoSQL key-value store for read-optimized workloads. The target representation is a JSON document per primary key, stored as {primary_key: value} in the key-value store. The system must first take a one-time consistent snapshot of the entire dataset, then continuously apply changes captured from the database's write-ahead log (CDC).
Assume:
-
Source DB: a relational database with MVCC and logical change streams (e.g., PostgreSQL logical replication or MySQL binlog).
-
Transport: a durable, partitioned queue supporting ordered delivery per key (e.g., Kafka/Pub/Sub).
-
Sink: a scalable key-value store (e.g., DynamoDB, Bigtable, Cassandra, Redis), supporting conditional updates or a way to emulate them.
Requirements
Design and explain a system that:
-
Performs a one-time, consistent snapshot, converting each row into {primary_key: value} JSON.
-
Continues with CDC via the write-ahead log to capture inserts, updates, and deletes.
-
Ensures:
-
Snapshot consistency
-
Ordering with CDC (especially per-key ordering)
-
Idempotency
-
Exactly-once semantics across replays (or effective-once)
-
Schema evolution handling
-
Backfill catch-up from snapshot to CDC
-
Failure recovery
-
Includes details on queueing, partitioning, and monitoring.
Deliverables:
-
High-level architecture
-
Key data model choices (key format, value JSON, metadata)
-
Algorithms for snapshot, CDC ingestion, and sink application
-
Explanations and guardrails for the above correctness and operational concerns