Design an In-Memory Database
Company: Valon
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's skills in data modeling, in-memory data structures, index design, and query/filter semantics for a flexible schema-less record store.
Part 1: Build a Schema-Less In-Memory Database
Constraints
- 0 <= len(records), len(queries) <= 2 * 10^5
- Each record is a dictionary from non-empty string column names to integer values
- Different records may contain different columns, and a record may be empty
- The total number of key/value pairs across all records is at most 2 * 10^5
- The total number of returned [record_id, value] pairs across all queries will not exceed 2 * 10^5
Examples
Input: ([{'age': 30, 'score': 90}, {'score': 80}, {'age': 25, 'level': 2}], ['age', 'score', 'level'])
Expected Output: [[[0, 30], [2, 25]], [[0, 90], [1, 80]], [[2, 2]]]
Explanation: Column 'age' appears in records 0 and 2, 'score' appears in records 0 and 1, and 'level' appears only in record 2.
Input: ([{'x': 1}, {'y': 2}], ['z', 'x'])
Expected Output: [[], [[0, 1]]]
Explanation: No record contains column 'z'. Only record 0 contains column 'x'.
Input: ([], ['a'])
Expected Output: [[]]
Explanation: With no inserted records, every query returns an empty list.
Input: ([{}, {'a': 5}, {'a': 7, 'b': 9}], ['a', 'b'])
Expected Output: [[[1, 5], [2, 7]], [[2, 9]]]
Explanation: The empty record contributes no columns. Column 'a' is present in records 1 and 2, and 'b' is present only in record 2.
Hints
- Instead of scanning every record for every query, build a reverse index from column name to all matching records while inserting.
- If you use insertion order as the record ID, the query results are automatically produced in sorted order.
Part 2: Filter a Schema-Less In-Memory Database
Constraints
- 0 <= len(records), len(filters) <= 2 * 10^5
- Each record and each filter is a dictionary from non-empty string column names to integer values
- A record matches a filter only if every column in the filter exists in the record and has the exact required value
- The total number of key/value pairs across all records plus the total number of conditions across all filters is at most 2 * 10^5
- The total size of all returned ID lists in the test data will not exceed 2 * 10^5
Examples
Input: ([{'a': 1, 'b': 2}, {'a': 1}, {'b': 2}, {'a': 2, 'b': 2}], [{'a': 1}, {'a': 1, 'b': 2}, {'b': 2}, {'c': 5}])
Expected Output: [[0, 1], [0], [0, 2, 3], []]
Explanation: The second filter requires both a=1 and b=2, so only record 0 matches. The last filter references a column that no record has.
Input: ([{'x': 5}, {'y': 5}, {'x': 5, 'y': 5}], [{'x': 5, 'y': 5}, {'y': 5}])
Expected Output: [[2], [1, 2]]
Explanation: For {'x': 5, 'y': 5}, records 0 and 1 are excluded because each is missing one required field.
Input: ([{'a': 1}, {}, {'b': 2}], [{}])
Expected Output: [[0, 1, 2]]
Explanation: An empty filter imposes no conditions, so every record matches.
Input: ([], [{'a': 1}, {}])
Expected Output: [[], []]
Explanation: With no records inserted, even the empty filter returns an empty list of record IDs.
Hints
- For each exact-match condition like column=value, store the sorted list of record IDs that satisfy it.
- A multi-column filter can be answered by intersecting the record ID lists for each condition; missing columns naturally produce no matches.