Implement in-memory DB querying
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's competency in data structures, algorithms, and systems-level thinking for implementing query processing in an in-memory database, including projection, predicate filtering, ordering, and index design.
Part 1: Project Selected Columns from an In-Memory Table
Constraints
- 0 <= len(rows) <= 10000
- 0 <= len(selected_columns) <= 100
- Each row is a dictionary with string keys
- Do not modify the input rows in place
Examples
Input: ([{'id': '1', 'name': 'Ada', 'birthday': '1815-12-10'}, {'id': '2', 'name': 'Charles', 'birthday': '1791-12-26'}], ['id', 'name'])
Expected Output: [{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}]
Explanation: Keep only the id and name columns from each row.
Input: ([{'id': '1', 'name': 'Ada'}], ['name'])
Expected Output: [{'name': 'Ada'}]
Explanation: A single-row table should still project correctly.
Input: ([], ['id'])
Expected Output: []
Explanation: Projecting an empty table returns an empty table.
Input: ([{'id': '1'}, {'name': 'Ada'}], ['id', 'name'])
Expected Output: [{'id': '1', 'name': None}, {'id': None, 'name': 'Ada'}]
Explanation: Missing requested columns should appear with value None.
Input: ([{'id': '1'}, {'id': '2'}], [])
Expected Output: [{}, {}]
Explanation: Selecting zero columns returns one empty row per input row.
Solution
def solution(rows, selected_columns):
result = []
for row in rows:
projected = {}
for column in selected_columns:
projected[column] = row.get(column, None)
result.append(projected)
return resultTime complexity: O(n * k), where n is the number of rows and k is the number of selected columns. Space complexity: O(n * k) for the output table.
Hints
- Build a fresh output row for each input row instead of deleting keys from the original.
- A dictionary get lookup is useful when a selected column may be missing.
Part 2: Filter Rows with Simple WHERE Conditions
Constraints
- 0 <= len(rows) <= 10000
- 0 <= len(conditions) <= 20
- Each operator is one of '=', '!=', '<', '<=', '>', '>='
- Values compared within a condition are mutually comparable
Examples
Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}], [('age', '=', 36)])
Expected Output: [{'id': 1, 'age': 36}, {'id': 3, 'age': 36}]
Explanation: Only rows with age equal to 36 should remain.
Input: ([{'id': 1, 'age': 36, 'name': 'Ada'}, {'id': 2, 'age': 28, 'name': 'Bob'}, {'id': 3, 'age': 40, 'name': 'Ada'}], [('name', '=', 'Ada'), ('age', '>', 36)])
Expected Output: [{'id': 3, 'age': 40, 'name': 'Ada'}]
Explanation: A row must satisfy both conditions.
Input: ([{'id': '1', 'birthday': '1815-12-10'}, {'id': '2', 'birthday': '1791-12-26'}], [('birthday', '<', '1800-01-01')])
Expected Output: [{'id': '2', 'birthday': '1791-12-26'}]
Explanation: ISO date strings can be compared lexicographically.
Input: ([{'id': 1}, {'id': 2}], [])
Expected Output: [{'id': 1}, {'id': 2}]
Explanation: No WHERE conditions means every row matches.
Input: ([{'id': 1}, {'id': 2, 'name': 'Ada'}], [('name', '=', 'Ada')])
Expected Output: [{'id': 2, 'name': 'Ada'}]
Explanation: Rows missing the filtered column do not match.
Solution
def solution(rows, conditions):
def matches(row):
for column, operator, value in conditions:
if column not in row:
return False
current = row[column]
if operator == '=':
ok = current == value
elif operator == '!=':
ok = current != value
elif operator == '<':
ok = current < value
elif operator == '<=':
ok = current <= value
elif operator == '>':
ok = current > value
elif operator == '>=':
ok = current >= value
else:
raise ValueError('Unsupported operator: ' + str(operator))
if not ok:
return False
return True
return [dict(row) for row in rows if matches(row)]Time complexity: O(n * m), where n is the number of rows and m is the number of conditions. Space complexity: O(r), where r is the number of matching rows returned.
Hints
- Write a helper that checks whether one row satisfies all conditions.
- An empty list of conditions should match every row.
Part 3: Sort Rows with ORDER BY on Multiple Columns
Constraints
- 0 <= len(rows) <= 10000
- 0 <= len(order_columns) <= 10
- len(order_columns) == len(ascending_flags)
- Every order column exists in every row and contains comparable values
Examples
Input: ([{'id': 1, 'name': 'Charles'}, {'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}], ['name'], [True])
Expected Output: [{'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}, {'id': 1, 'name': 'Charles'}]
Explanation: Sort by name ascending.
Input: ([{'id': 1, 'age': 30}, {'id': 2, 'age': 20}, {'id': 3, 'age': 40}], ['age'], [False])
Expected Output: [{'id': 3, 'age': 40}, {'id': 1, 'age': 30}, {'id': 2, 'age': 20}]
Explanation: Sort by age descending.
Input: ([{'id': 1, 'age': 30, 'name': 'Bob'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}], ['age', 'name', 'id'], [True, True, False])
Expected Output: [{'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 1, 'age': 30, 'name': 'Bob'}]
Explanation: First sort by age ascending, then name ascending, then id descending to break ties.
Input: ([], ['name'], [True])
Expected Output: []
Explanation: Sorting an empty table returns an empty table.
Input: ([{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}], [], [])
Expected Output: [{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}]
Explanation: No ORDER BY columns means keep the original order.
Solution
def solution(rows, order_columns, ascending_flags):
if len(order_columns) != len(ascending_flags):
raise ValueError('order_columns and ascending_flags must have the same length')
result = [dict(row) for row in rows]
for column, ascending in reversed(list(zip(order_columns, ascending_flags))):
result.sort(key=lambda row: row[column], reverse=not ascending)
return resultTime complexity: O(k * n log n), where n is the number of rows and k is the number of ORDER BY columns. Space complexity: O(n) for the copied output rows.
Hints
- Python's sort is stable, so sorting from the last key to the first key is a clean way to handle mixed directions.
- Make a copy of the rows before sorting if you do not want to mutate the input.
Part 4: Build a Simple Equality Index for Fast Lookups
Constraints
- 0 <= len(rows) <= 100000
- 0 <= len(lookup_values) <= 100000
- Indexed values are hashable
- Ignore rows missing the indexed column
Examples
Input: ([{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}, {'id': '3', 'name': 'Ada'}], 'name', ['Ada', 'Charles', 'Eve'])
Expected Output: [[0, 2], [1], []]
Explanation: The index groups all row positions for each name.
Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}, {'id': 4, 'age': 40}], 'age', [40, 36])
Expected Output: [[3], [0, 2]]
Explanation: Multiple rows can share the same indexed value.
Input: ([{'id': 1}, {'id': 2, 'city': 'Paris'}, {'id': 3, 'city': 'Paris'}], 'city', ['Paris', 'London'])
Expected Output: [[1, 2], []]
Explanation: Rows missing the indexed column are ignored.
Input: ([], 'name', ['Ada'])
Expected Output: [[]]
Explanation: An empty table yields no matches for any lookup.
Solution
def solution(rows, index_column, lookup_values):
index = {}
for i, row in enumerate(rows):
if index_column in row:
key = row[index_column]
index.setdefault(key, []).append(i)
return [list(index.get(value, [])) for value in lookup_values]Time complexity: O(n + q + total_output_size) on average, where n is the number of rows and q is the number of lookups. Space complexity: O(n) for the index.
Hints
- Build the mapping from value to row positions once, then answer each lookup in O(1) average time.
- If a value appears multiple times, store all matching positions, not just one.