Implement an in-memory database with TTL and backup
Company: xAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Take-home Project
Quick Answer: This question evaluates proficiency in designing and implementing mutable in-memory data structures, field-level CRUD and scanning, timestamped operations with TTL semantics, and backup/restore snapshot management, focusing on data structures, temporal reasoning, and state management in the Coding & Algorithms domain.
Part 1: Basic CRUD on In-Memory Database Fields
Constraints
- 0 <= len(queries) <= 100000
- Each key, field, and value is a string of length 1 to 100
- Only the operations 'SET', 'GET', and 'DELETE' appear
- Average O(1) hash-map operations may be assumed
Examples
Input: [['SET', 'user1', 'name', 'alice'], ['GET', 'user1', 'name'], ['SET', 'user1', 'name', 'bob'], ['GET', 'user1', 'name'], ['DELETE', 'user1', 'name'], ['GET', 'user1', 'name'], ['DELETE', 'user1', 'name']]
Expected Output: ['alice', 'bob', True, '', False]
Explanation: The value is overwritten from 'alice' to 'bob'. The first delete succeeds, and the second delete fails because the field is already gone.
Input: [['GET', 'missing', 'x'], ['DELETE', 'missing', 'x']]
Expected Output: ['', False]
Explanation: Missing keys and fields are treated as absent.
Input: [['SET', 'a', 'x', '1'], ['SET', 'a', 'y', '2'], ['DELETE', 'a', 'x'], ['GET', 'a', 'x'], ['GET', 'a', 'y']]
Expected Output: [True, '', '2']
Explanation: Deleting one field does not affect other fields under the same key.
Input: [['SET', 'k', 'f', 'v']]
Expected Output: []
Explanation: A single 'SET' query produces no output.
Solution
def solution(queries):
db = {}
out = []
for q in queries:
op = q[0]
if op == 'SET':
key, field, value = q[1], q[2], q[3]
db.setdefault(key, {})[field] = value
elif op == 'GET':
key, field = q[1], q[2]
out.append(db.get(key, {}).get(field, ''))
else: # DELETE
key, field = q[1], q[2]
if key in db and field in db[key]:
del db[key][field]
if not db[key]:
del db[key]
out.append(True)
else:
out.append(False)
return outTime complexity: O(Q) average, where Q is the number of queries.. Space complexity: O(T), where T is the number of stored fields..
Hints
- A nested dictionary works well: one map from key to another map of field to value.
- After deleting a field, you may remove the key entirely if it has no fields left.
Part 2: Sorted Scan and Prefix Scan in an In-Memory Database
Constraints
- 0 <= len(queries) <= 100000
- Each key, field, value, and prefix is a string of length 0 to 100
- Results for scan operations must be sorted lexicographically by field name
- Average O(1) hash-map operations may be assumed
Examples
Input: [['SET', 'user', 'b', '2'], ['SET', 'user', 'a', '1'], ['SCAN', 'user'], ['SCAN_BY_PREFIX', 'user', 'a']]
Expected Output: [['a(1)', 'b(2)'], ['a(1)']]
Explanation: Fields are returned in lexicographic order, and the prefix scan keeps only fields starting with 'a'.
Input: [['SET', 'k', 'name', 'Ann'], ['DELETE', 'k', 'age'], ['GET', 'k', 'age'], ['SCAN_BY_PREFIX', 'k', 'z'], ['SCAN', 'missing']]
Expected Output: [False, '', [], []]
Explanation: Deleting a missing field returns False. A missing field returns '', and scans with no matches return empty lists.
Input: [['SET', 'p', 'ab', '1'], ['SET', 'p', 'aa', '2'], ['SET', 'p', 'ab', '3'], ['DELETE', 'p', 'aa'], ['SCAN', 'p'], ['SCAN_BY_PREFIX', 'p', 'a']]
Expected Output: [True, ['ab(3)'], ['ab(3)']]
Explanation: Overwriting 'ab' updates its value, and deleting 'aa' removes it from both scans.
Input: [['SET', 'x', 'field', 'v'], ['SCAN_BY_PREFIX', 'x', 'field']]
Expected Output: [['field(v)']]
Explanation: A single field that matches the prefix is returned in the required format.
Solution
def solution(queries):
db = {}
out = []
def format_scan(key, prefix=None):
record = db.get(key, {})
fields = [field for field in record if prefix is None or field.startswith(prefix)]
fields.sort()
return [f'{field}({record[field]})' for field in fields]
for q in queries:
op = q[0]
if op == 'SET':
key, field, value = q[1], q[2], q[3]
db.setdefault(key, {})[field] = value
elif op == 'GET':
key, field = q[1], q[2]
out.append(db.get(key, {}).get(field, ''))
elif op == 'DELETE':
key, field = q[1], q[2]
if key in db and field in db[key]:
del db[key][field]
if not db[key]:
del db[key]
out.append(True)
else:
out.append(False)
elif op == 'SCAN':
key = q[1]
out.append(format_scan(key))
else: # SCAN_BY_PREFIX
key, prefix = q[1], q[2]
out.append(format_scan(key, prefix))
return outTime complexity: SET/GET/DELETE are O(1) average. Each SCAN or SCAN_BY_PREFIX is O(F log F), where F is the number of fields under the queried key.. Space complexity: O(T), where T is the number of stored fields..
Hints
- Keep the same nested dictionary structure from basic CRUD.
- For a scan, collect matching field names first, sort them, then build the 'field(value)' strings.
Part 3: Timestamped In-Memory Database with TTL
Constraints
- 0 <= len(queries) <= 100000
- All timestamps are decimal strings representing integers in [0, 10^9]
- All ttl values are positive decimal strings representing integers in [1, 10^9]
- The current timestamps in the queries are non-decreasing
Examples
Input: [['SET_AT_WITH_TTL', 'k', 'a', 'x', '10', '5'], ['GET_AT', 'k', 'a', '10'], ['GET_AT', 'k', 'a', '14'], ['GET_AT', 'k', 'a', '15'], ['DELETE_AT', 'k', 'a', '15']]
Expected Output: ['x', 'x', '', False]
Explanation: The field is valid at times 10 through 14, but not at 15 because the TTL interval is [10, 15).
Input: [['SET_AT', 'u', 'name', 'Ann', '1'], ['SET_AT_WITH_TTL', 'u', 'age', '20', '2', '3'], ['SCAN_AT', 'u', '3'], ['SET_AT', 'u', 'age', '21', '4'], ['GET_AT', 'u', 'age', '6'], ['SCAN_BY_PREFIX_AT', 'u', 'a', '6']]
Expected Output: [['age(20)', 'name(Ann)'], '21', ['age(21)']]
Explanation: The TTL version of 'age' is alive at time 3. It is overwritten at time 4 by a non-TTL value, so it still exists at time 6.
Input: [['SET_AT_WITH_TTL', 'x', 'f', '1', '5', '2'], ['DELETE_AT', 'x', 'f', '6'], ['GET_AT', 'x', 'f', '6'], ['SET_AT_WITH_TTL', 'x', 'g', '2', '8', '1'], ['DELETE_AT', 'x', 'g', '9'], ['SCAN_AT', 'x', '9']]
Expected Output: [True, '', False, []]
Explanation: The first delete happens while 'f' is still alive, so it succeeds. The second delete happens exactly at expiry time for 'g', so it fails.
Input: [['SCAN_AT', 'missing', '100'], ['GET_AT', 'missing', 'a', '100']]
Expected Output: [[], '']
Explanation: Scanning or reading a missing key returns an empty list or empty string.
Solution
def solution(queries):
db = {}
out = []
def purge_key(key, t):
record = db.get(key)
if record is None:
return
dead = [field for field, (_, expire_at) in record.items() if expire_at is not None and t >= expire_at]
for field in dead:
del record[field]
if not record:
del db[key]
def scan_list(key, prefix=None):
record = db.get(key, {})
fields = [field for field in record if prefix is None or field.startswith(prefix)]
fields.sort()
return [f'{field}({record[field][0]})' for field in fields]
for q in queries:
op = q[0]
if op == 'SET_AT':
key, field, value, t = q[1], q[2], q[3], int(q[4])
purge_key(key, t)
db.setdefault(key, {})[field] = [value, None]
elif op == 'SET_AT_WITH_TTL':
key, field, value, t, ttl = q[1], q[2], q[3], int(q[4]), int(q[5])
purge_key(key, t)
db.setdefault(key, {})[field] = [value, t + ttl]
elif op == 'GET_AT':
key, field, t = q[1], q[2], int(q[3])
purge_key(key, t)
record = db.get(key, {})
out.append(record[field][0] if field in record else '')
elif op == 'DELETE_AT':
key, field, t = q[1], q[2], int(q[3])
purge_key(key, t)
if key in db and field in db[key]:
del db[key][field]
if not db[key]:
del db[key]
out.append(True)
else:
out.append(False)
elif op == 'SCAN_AT':
key, t = q[1], int(q[2])
purge_key(key, t)
out.append(scan_list(key))
else: # SCAN_BY_PREFIX_AT
key, prefix, t = q[1], q[2], int(q[3])
purge_key(key, t)
out.append(scan_list(key, prefix))
return outTime complexity: Point operations are O(1) average plus lazy cleanup on the touched key. Each scan is O(F log F), where F is the number of live fields under the queried key.. Space complexity: O(T), where T is the number of live stored fields..
Hints
- Store an absolute expiration time like expire_at = timestamp + ttl instead of storing ttl directly.
- Because time never goes backward, you can lazily remove expired fields whenever a key is touched.
Part 4: Backup and Restore in a Timestamped Database with TTL
Constraints
- 0 <= len(queries) <= 100000
- All timestamps, ttl values, and restore arguments are decimal strings representing integers in [0, 10^9]
- All ttl values are positive
- The current time arguments for SET/GET/DELETE/SCAN/BACKUP and RESTORE-now are non-decreasing
- timestamp_to_restore may point to any past time
Examples
Input: [['SET_AT_WITH_TTL', 'doc', 'a', '1', '10', '5'], ['BACKUP', '12'], ['GET_AT', 'doc', 'a', '13'], ['RESTORE', '20', '12'], ['SCAN_AT', 'doc', '22'], ['GET_AT', 'doc', 'a', '23']]
Expected Output: ['1', ['a(1)'], '']
Explanation: At backup time 12, the field has 3 units of TTL left. After restoring at time 20, it is alive at 22 and expires at 23.
Input: [['SET_AT', 'k', 'x', 'A', '1'], ['BACKUP', '2'], ['SET_AT', 'k', 'x', 'B', '3'], ['BACKUP', '4'], ['DELETE_AT', 'k', 'x', '5'], ['RESTORE', '6', '3'], ['GET_AT', 'k', 'x', '6'], ['RESTORE', '7', '100'], ['GET_AT', 'k', 'x', '7']]
Expected Output: [True, 'A', 'B']
Explanation: Restoring to time 3 uses the backup from time 2, so the value is 'A'. Restoring to time 100 uses the later backup from time 4, so the value is 'B'.
Input: [['SET_AT', 'p', 'aa', '1', '1'], ['SET_AT_WITH_TTL', 'p', 'ab', '2', '2', '3'], ['BACKUP', '5'], ['RESTORE', '8', '5'], ['SCAN_BY_PREFIX_AT', 'p', 'a', '8']]
Expected Output: [['aa(1)']]
Explanation: Field 'ab' expires exactly at time 5, so it is not included in the backup taken at time 5.
Input: [['SET_AT', 'u', 'f', 'v', '1'], ['BACKUP', '3'], ['DELETE_AT', 'u', 'f', '4'], ['RESTORE', '10', '2'], ['GET_AT', 'u', 'f', '10'], ['SCAN_AT', 'u', '10']]
Expected Output: [True, '', []]
Explanation: There is no backup with time <= 2, so the restore loads an empty database.
Solution
from bisect import bisect_right
def solution(queries):
db = {}
backups = []
backup_times = []
out = []
def purge_key(key, t):
record = db.get(key)
if record is None:
return
dead = [field for field, (_, expire_at) in record.items() if expire_at is not None and t >= expire_at]
for field in dead:
del record[field]
if not record:
del db[key]
def scan_list(key, prefix=None):
record = db.get(key, {})
fields = [field for field in record if prefix is None or field.startswith(prefix)]
fields.sort()
return [f'{field}({record[field][0]})' for field in fields]
def build_snapshot(t):
snapshot = {}
live_db = {}
for key, record in db.items():
snap_record = {}
live_record = {}
for field, (value, expire_at) in record.items():
if expire_at is None or t < expire_at:
remaining = None if expire_at is None else expire_at - t
snap_record[field] = [value, remaining]
live_record[field] = [value, expire_at]
if snap_record:
snapshot[key] = snap_record
if live_record:
live_db[key] = live_record
return snapshot, live_db
for q in queries:
op = q[0]
if op == 'SET_AT':
key, field, value, t = q[1], q[2], q[3], int(q[4])
purge_key(key, t)
db.setdefault(key, {})[field] = [value, None]
elif op == 'SET_AT_WITH_TTL':
key, field, value, t, ttl = q[1], q[2], q[3], int(q[4]), int(q[5])
purge_key(key, t)
db.setdefault(key, {})[field] = [value, t + ttl]
elif op == 'GET_AT':
key, field, t = q[1], q[2], int(q[3])
purge_key(key, t)
record = db.get(key, {})
out.append(record[field][0] if field in record else '')
elif op == 'DELETE_AT':
key, field, t = q[1], q[2], int(q[3])
purge_key(key, t)
if key in db and field in db[key]:
del db[key][field]
if not db[key]:
del db[key]
out.append(True)
else:
out.append(False)
elif op == 'SCAN_AT':
key, t = q[1], int(q[2])
purge_key(key, t)
out.append(scan_list(key))
elif op == 'SCAN_BY_PREFIX_AT':
key, prefix, t = q[1], q[2], int(q[3])
purge_key(key, t)
out.append(scan_list(key, prefix))
elif op == 'BACKUP':
t = int(q[1])
snapshot, db = build_snapshot(t)
backups.append(snapshot)
backup_times.append(t)
else: # RESTORE
now, target = int(q[1]), int(q[2])
idx = bisect_right(backup_times, target) - 1
if idx < 0:
db = {}
else:
snapshot = backups[idx]
restored = {}
for key, record in snapshot.items():
restored[key] = {}
for field, (value, remaining) in record.items():
expire_at = None if remaining is None else now + remaining
restored[key][field] = [value, expire_at]
db = restored
return outTime complexity: Point operations are O(1) average plus lazy cleanup on the touched key; each scan is O(F log F); each BACKUP is O(N); each RESTORE is O(L + log K), where F is fields under one key, N is the number of live fields in the whole database, L is the number of fields in the restored snapshot, and K is the number of backups.. Space complexity: O(N + S), where N is the current number of live fields and S is the total number of fields stored across all backups..
Hints
- At backup time t, store remaining_ttl = expire_at - t for live TTL fields.
- Since backups are taken in time order, you can keep them in a list and use binary search to find the latest backup with time <= timestamp_to_restore.