In-Memory Key→Hash Database: Design and Implementation
Context
Build a single-process, in-memory data store that supports a Redis-like hash model with per-field TTL, lexicographic scans (including prefix scans), and a simple backup/restore counter mechanism. All read operations are evaluated "as of" a supplied timestamp.
Assume timestamps are monotonically increasing logical times (e.g., epoch milliseconds as 64-bit integers). If multiple writes occur to the same field at different timestamps, reads at time t should observe the latest write with write_timestamp ≤ t, subject to TTL.
Requirements
-
Data model and basic operations
-
Store data as: key -> { field -> value } with per-field TTL.
-
set(key, field, value, write_timestamp, ttl)
-
A field expires at expiry = write_timestamp + ttl.
-
Define behavior when ttl is null or non-positive.
-
get(key, field, timestamp) -> value or null
-
Return the value visible and unexpired at the provided timestamp (as of semantics).
-
Scans
-
scan(key, timestamp) -> list of (field, value) for fields unexpired at timestamp.
-
scanPrefix(key, prefix, timestamp) -> same, but only for fields whose field name starts with prefix.
-
Results must be sorted lexicographically by the raw field names. Ensure correct ordering when prefixes or stored field names contain special characters (parentheses, punctuation, delimiters). Clearly describe how you will normalize or tokenize inputs to avoid incorrect lexicographic order.
-
TTL semantics
-
All reads (get/scan/scanPrefix) must filter out expired entries as of the provided timestamp.
-
Describe how to store per-field expiry and how to garbage-collect or lazily purge expired data.
-
Backup/restore counters
-
backup(timestamp) -> returns the count of unexpired fields in the entire database at that timestamp (each field counted at most once per key).
-
restore(timestamp) -> returns the count from the most recent backup (clarify how the provided timestamp is used), without mutating data state.
-
Clarify edge cases: no prior backup, concurrent writes during backup, and whether backup/restore affect persistence or are purely metadata operations.
-
Performance and design
-
Specify time and space complexities for set/get/scan/scanPrefix/backup/restore.
-
Propose data structures (e.g., per-key hash map plus per-key sorted index or trie for field names; min-heap or time wheel for expirations) and justify choices.
-
Discuss concurrency, memory footprint, and how to test corner cases (special characters in prefixes, very large numbers of fields, and mass expirations).