Design in-memory DB with TTL and backup
Company: Circle
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Take-home Project
##### Question
Design and implement an in-memory, versioned **key → (field → value)** database that supports per-field TTL, time-travel reads, lexicographically ordered scans, and snapshot backup/restore. Every read and write carries a logical timestamp `t`; all reads are evaluated *as of* `t` (not just "now"). `ttl` is expressed in the same units as `t`, and `ttl = null` means no expiry.
Implement the following timestamped operations:
1. **`set(key, field, value, t, ttl)`** — Upsert `value` at `(key, field)`. Maintain append-only history so each `(key, field)` has a sequence of validity intervals `[start, end)`. A field's value expires at `start + ttl` (when `ttl` is set). Define behavior when `ttl` is `null` (no expiry) and when `ttl` is non-positive. Updating an already-expired field is treated as a fresh insert.
2. **`get(key, field, t)`** — Return the value visible at time `t`, i.e. the latest version with `start ≤ t` that has not expired (`t < start + ttl` when TTL is set, and the interval has not been closed/tombstoned by `t`). Otherwise return `null`.
3. **`delete(key, field, t)`** — Log a **tombstone** that ends the current interval at `t` while preserving prior history for future replay/restore.
4. **`scan(key, t)`** — Return all non-expired `(field, value)` pairs under `key` as of `t`, ordered **lexicographically by raw field name**.
5. **`scanPrefix(key, prefix, t)`** — Same as `scan`, but only fields whose name starts with `prefix`, again ordered lexicographically.
6. **`backup(t)`** — Record a snapshot marker at time `t` and return the total count of non-expired `(key, field)` pairs across the whole database as of `t`.
7. **`restore(t)`** — Rewind the database's observable state to the most recent backup at or before `t`, and return the count that backup had recorded. Because deletions are tombstones, history remains replayable.
**Requirements and clarifications:**
- **TTL granularity:** TTL applies at the `(key, field)` level, not the key level. All reads and scans must exclude entries expired as of the query timestamp. Describe how you store per-field expiry and how you garbage-collect or lazily purge expired data — while keeping enough history to `restore` to any retained backup.
- **Deterministic ordering with special characters:** Scans must be deterministically ordered by the raw field name. Field names and prefixes may contain parentheses, punctuation, or other delimiters (e.g. `"a(1)"`, `"a(2)"`). Explain how you keep ordering correct (e.g. compare raw field strings by Unicode code point / UTF-8 byte order, store the ordering key separately from any serialized metadata, and never tokenize or split on punctuation).
- **Time-travel semantics:** Reads use the latest version with `start ≤ t`; a later write at `t' > t` must not be visible at `t`. Describe how you track `[start, end)`, value, ttl, and tombstones per `(key, field)` to support this.
- **Complexity & trade-offs:** Give expected time and space complexity for every operation. Discuss per-key ordered maps (for ordered/prefix scans) vs. brute-force-plus-sort, and global vs. per-key expiry indexes for answering `backup(t)` at arbitrary `t`. Discuss concurrency and how you would test corner cases (special characters in prefixes, very large field counts, mass expirations).
Quick Answer: A Circle software-engineer take-home: design a versioned in-memory key→(field→value) database with per-field TTL, append-only history and delete tombstones, lexicographically ordered scan and scanPrefix (correct even with special characters), and snapshot backup/restore counters. It exercises data modeling, time-travel visibility semantics, ordered indexing, garbage collection, and per-operation complexity trade-offs.