Design an in-memory KV DB with TTL, scans, backup
Company: Circle
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Take-home Project
Design and implement an in-memory key→hash database. Requirements:
1) Data model and basic ops: Store data as key -> {field -> value}. Provide set(key, field, value, timestamp, ttl) and get(key, field, timestamp). A field expires at write_timestamp + ttl; if ttl is null or non-positive, treat accordingly and define expected behavior.
2) Scans: Implement scan(key, timestamp) -> list of (field, value) for fields not expired at timestamp; and scanPrefix(key, prefix, timestamp) -> same but only fields whose field name starts with prefix. Results must be lexicographically sorted by the raw field names. Ensure correct ordering when prefixes or stored field names contain characters such as parentheses or other delimiters; describe how you will normalize or tokenize inputs (e.g., keep raw field strings, split encoded strings before sorting, or store field names separately from any serialized metadata) to avoid incorrect lexicographic order.
3) TTL semantics: All read operations (get/scan/scanPrefix) must filter out expired entries as of the provided timestamp; describe how to store per-field expiry and how to garbage-collect or lazily purge expired data.
4) Backup/restore counters: Implement backup(timestamp) -> returns the count of unexpired fields in the database at that timestamp; implement restore(timestamp) -> returns the count from the most recent backup without mutating data state. Clarify edge cases (no prior backup, concurrent writes during backup) and whether backup/restore affect persistence or are purely metadata operations.
5) Performance and design: Specify time and space complexities for set/get/scan/scanPrefix/backup/restore. Propose data structures (e.g., per-key hash map plus per-key sorted index or trie for field names; min-heap or time wheel for expirations) and justify choices. Discuss concurrency, memory footprint, and how you would test corner cases (special characters in prefixes, very large numbers of fields, and mass expirations).
Quick Answer: This question evaluates the ability to design an in-memory key-value store with per-field TTL, lexicographic scans, and backup/restore counters, exercising knowledge of data structures, time-based consistency, indexing, and performance trade-offs.