How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at OpenAI during technical interviews.

Implement in-memory DB querying | OpenAI Coding Question

Implement in-memory DB querying

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

##### Question Implement an in-memory database that supports: 1. Querying the whole table and returning only selected columns (projection). 2. Adding WHERE clause filtering with simple conditions like (column, operator, value). 3. Adding ORDER BY on one or more columns with ascending/descending control. 4. Explaining how you would design and build an index to accelerate such queries (no code required). Example public API: db = DB() db.insert("users", {"id": "1", "name": "Ada", "birthday": "1815-12-10"}) … db.query("users", ["id"], conditions=[("name", "=", "Charles")], order_by=(["birthday"], False)) # returns sorted projection

Quick Answer: This question evaluates a candidate's competency in data structures, algorithms, and systems-level thinking for implementing query processing in an in-memory database, including projection, predicate filtering, ordering, and index design.

Part 1: Project Selected Columns from an In-Memory Table

Implement **relational projection** over an in-memory table. You are given a table as a **list of rows**, where each row is a dictionary mapping string column names to values. Given a list of column names to keep, return a **new table** that contains only those columns for every row. ## Function ```python def solution(rows, selected_columns): ... ``` - **`rows`** — a list of dictionaries. Each dictionary represents one row, with string keys (column names). - **`selected_columns`** — a list of strings naming the columns to project, in the order they should appear in the output. ## What to return A **new list of dictionaries** — one output row per input row — where each output row contains **exactly** the columns named in `selected_columns`: - Keep the rows in their **original order**. - In each output row, include **only** the keys in `selected_columns`, in that **same order**. Drop any other columns present in the source row. - If a requested column is **missing** from a source row, still include that key in the output with value **`None`**. Do **not** modify the input `rows` (build fresh dictionaries; the source rows are read-only). ## Examples - `rows = [{"id": "1", "name": "Ada", "birthday": "1815-12-10"}, {"id": "2", "name": "Charles", "birthday": "1791-12-26"}]`, `selected_columns = ["id", "name"]` → `[{"id": "1", "name": "Ada"}, {"id": "2", "name": "Charles"}]` (the `birthday` column is dropped). - `rows = [{"id": "1"}, {"name": "Ada"}]`, `selected_columns = ["id", "name"]` → `[{"id": "1", "name": None}, {"id": None, "name": "Ada"}]` (missing columns become `None`). ## Edge cases - If `rows` is empty, return `[]`. - If `selected_columns` is empty, return one **empty dictionary** `{}` for each input row (e.g. two rows → `[{}, {}]`). ## Constraints - `0 <= len(rows) <= 10000` - `0 <= len(selected_columns) <= 100` - Each row is a dictionary with string keys. - Do not modify the input rows in place.

Constraints

0 <= len(rows) <= 10000
0 <= len(selected_columns) <= 100
Each row is a dictionary with string keys
Do not modify the input rows in place

Examples

Input: ([{'id': '1', 'name': 'Ada', 'birthday': '1815-12-10'}, {'id': '2', 'name': 'Charles', 'birthday': '1791-12-26'}], ['id', 'name'])

Expected Output: [{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}]

Explanation: Keep only the id and name columns from each row.

Input: ([{'id': '1', 'name': 'Ada'}], ['name'])

Expected Output: [{'name': 'Ada'}]

Explanation: A single-row table should still project correctly.

Input: ([], ['id'])

Expected Output: []

Explanation: Projecting an empty table returns an empty table.

Input: ([{'id': '1'}, {'name': 'Ada'}], ['id', 'name'])

Expected Output: [{'id': '1', 'name': None}, {'id': None, 'name': 'Ada'}]

Explanation: Missing requested columns should appear with value None.

Input: ([{'id': '1'}, {'id': '2'}], [])

Expected Output: [{}, {}]

Explanation: Selecting zero columns returns one empty row per input row.

Hints

Build a fresh output row for each input row instead of deleting keys from the original.
A dictionary get lookup is useful when a selected column may be missing.

Part 2: Filter Rows with Simple WHERE Conditions

Filter the rows of a table that satisfy **every** condition in a set of simple SQL-style `WHERE` clauses. ## What to implement Implement `solution(rows, conditions)`. - **`rows`** — a list of rows, where each row is a dictionary mapping a column name (string) to its value. - **`conditions`** — a list of conditions, where each condition is a tuple `(column, operator, value)`. Return the list of rows that match **all** of the given conditions (the conditions are combined with logical **AND**), in their **original order**. ## Operators `operator` is one of the following comparison operators, each applied as `row[column] operator value`: | Operator | Meaning | |----------|-----------------------| | `=` | equal to | | `!=` | not equal to | | `<` | less than | | `<=` | less than or equal | | `>` | greater than | | `>=` | greater than or equal | ## Matching rules - A row matches only if it satisfies **every** condition. If any condition fails, the row is excluded. - **Missing column:** if a row does not contain the `column` referenced by a condition, that condition is treated as **not matching** (the row is excluded). - **No conditions:** if `conditions` is empty, every row matches and all rows are returned (in original order). - Within a single condition, `row[column]` and `value` are comparable (e.g. numbers with numbers, strings with strings). ## Examples **Example 1** ``` rows = [{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}] conditions = [('age', '=', 36)] returns [{'id': 1, 'age': 36}, {'id': 3, 'age': 36}] ``` **Example 2** — multiple conditions are ANDed together: ``` rows = [ {'id': 1, 'age': 36, 'name': 'Ada'}, {'id': 2, 'age': 28, 'name': 'Bob'}, {'id': 3, 'age': 40, 'name': 'Ada'}, ] conditions = [('name', '=', 'Ada'), ('age', '>', 36)] returns [{'id': 3, 'age': 40, 'name': 'Ada'}] ``` **Example 3** — a row missing the referenced column does not match: ``` rows = [{'id': 1}, {'id': 2, 'name': 'Ada'}] conditions = [('name', '=', 'Ada')] returns [{'id': 2, 'name': 'Ada'}] ``` ## Constraints - `0 <= len(rows) <= 10000` - `0 <= len(conditions) <= 20` - Each operator is one of `=`, `!=`, `<`, `<=`, `>`, `>=`. - Values compared within a condition are mutually comparable.

Constraints

0 <= len(rows) <= 10000
0 <= len(conditions) <= 20
Each operator is one of '=', '!=', '<', '<=', '>', '>='
Values compared within a condition are mutually comparable

Examples

Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}], [('age', '=', 36)])

Expected Output: [{'id': 1, 'age': 36}, {'id': 3, 'age': 36}]

Explanation: Only rows with age equal to 36 should remain.

Input: ([{'id': 1, 'age': 36, 'name': 'Ada'}, {'id': 2, 'age': 28, 'name': 'Bob'}, {'id': 3, 'age': 40, 'name': 'Ada'}], [('name', '=', 'Ada'), ('age', '>', 36)])

Expected Output: [{'id': 3, 'age': 40, 'name': 'Ada'}]

Explanation: A row must satisfy both conditions.

Input: ([{'id': '1', 'birthday': '1815-12-10'}, {'id': '2', 'birthday': '1791-12-26'}], [('birthday', '<', '1800-01-01')])

Expected Output: [{'id': '2', 'birthday': '1791-12-26'}]

Explanation: ISO date strings can be compared lexicographically.

Input: ([{'id': 1}, {'id': 2}], [])

Expected Output: [{'id': 1}, {'id': 2}]

Explanation: No WHERE conditions means every row matches.

Input: ([{'id': 1}, {'id': 2, 'name': 'Ada'}], [('name', '=', 'Ada')])

Expected Output: [{'id': 2, 'name': 'Ada'}]

Explanation: Rows missing the filtered column do not match.

Hints

Write a helper that checks whether one row satisfies all conditions.
An empty list of conditions should match every row.

Part 3: Sort Rows with ORDER BY on Multiple Columns

Implement a multi-column `ORDER BY`: sort a table of rows by one or more columns, where each column has its own ascending/descending direction. ## Function ```python def solution(rows, order_columns, ascending_flags): ... ``` ## Input - **`rows`** — the table, as a list of row dictionaries. Each row maps **column name → value**. - **`order_columns`** — a list of column names to sort by, given in **priority order** (the first column is the primary sort key, the second breaks ties on the first, and so on). - **`ascending_flags`** — a list of booleans, one per column in `order_columns` and aligned by position: - `True` → sort that column in **ascending** order - `False` → sort that column in **descending** order So `ascending_flags[i]` is the direction for `order_columns[i]`. ## Output Return a **new** list of rows sorted according to the ORDER BY rules. - Sort **lexicographically** by the columns in priority order: compare on the first column; for rows that tie, compare on the second column; and so on through the list. Each comparison respects that column's own ascending/descending flag. - Do **not** modify the input — neither the `rows` list nor any of the original row dictionaries. The returned rows should be fresh copies. ## Rules and edge cases - **No order columns** — if `order_columns` is empty, return the rows in their **original order** (as fresh copies). - **Empty table** — if `rows` is empty, return an empty list. ## Constraints - `0 <= len(rows) <= 10000` - `0 <= len(order_columns) <= 10` - `len(order_columns) == len(ascending_flags)` - Every column in `order_columns` exists in every row and holds **comparable** values.

Constraints

0 <= len(rows) <= 10000
0 <= len(order_columns) <= 10
len(order_columns) == len(ascending_flags)
Every order column exists in every row and contains comparable values

Examples

Input: ([{'id': 1, 'name': 'Charles'}, {'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}], ['name'], [True])

Expected Output: [{'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}, {'id': 1, 'name': 'Charles'}]

Explanation: Sort by name ascending.

Input: ([{'id': 1, 'age': 30}, {'id': 2, 'age': 20}, {'id': 3, 'age': 40}], ['age'], [False])

Expected Output: [{'id': 3, 'age': 40}, {'id': 1, 'age': 30}, {'id': 2, 'age': 20}]

Explanation: Sort by age descending.

Input: ([{'id': 1, 'age': 30, 'name': 'Bob'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}], ['age', 'name', 'id'], [True, True, False])

Expected Output: [{'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 1, 'age': 30, 'name': 'Bob'}]

Explanation: First sort by age ascending, then name ascending, then id descending to break ties.

Input: ([], ['name'], [True])

Expected Output: []

Explanation: Sorting an empty table returns an empty table.

Input: ([{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}], [], [])

Expected Output: [{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}]

Explanation: No ORDER BY columns means keep the original order.

Hints

Python's sort is stable, so sorting from the last key to the first key is a clean way to handle mixed directions.
Make a copy of the rows before sorting if you do not want to mutate the input.

Part 4: Build a Simple Equality Index for Fast Lookups

Build a simple **equality (hash) index** over a table, then use it to answer equality lookups quickly. In many in-memory databases, repeated equality filters on a single column can be sped up by precomputing a map from each column value to the row positions that hold it. Your task is to build such an index and use it to resolve a batch of lookup values. ### Implement ```python def solution(rows, index_column, lookup_values): ... ``` ### Input - **`rows`** — a list of rows, where each row is a dictionary mapping **column name → value**. Rows may have different sets of columns. - **`index_column`** — the name (string) of the column to index on. - **`lookup_values`** — a list of values to look up against the indexed column. ### Output Return a list **parallel to `lookup_values`**. For each lookup value, in the same order, return the list of **0-based positions** of the rows whose value in `index_column` equals that lookup value. - Positions within each list must be in **ascending order** (i.e. in the order the rows appear in `rows`). - If a lookup value matches no row, return an **empty list** `[]` for it. - The result has exactly one entry per lookup value. If the same lookup value appears more than once in `lookup_values`, it produces a separate matching list each time. ### Rules - A row at position `i` matches a value `v` when `row[index_column] == v`. - **Ignore any row that does not contain `index_column`** — such rows are never included in any result and their positions are skipped (but positions of the remaining rows are still their original indices in `rows`). - Values in the indexed column are **hashable**. ### Examples - `rows = [{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}, {'id': '3', 'name': 'Ada'}]`, `index_column = 'name'`, `lookup_values = ['Ada', 'Charles', 'Eve']` → `[[0, 2], [1], []]` - `rows = [{'id': 1}, {'id': 2, 'city': 'Paris'}, {'id': 3, 'city': 'Paris'}]`, `index_column = 'city'`, `lookup_values = ['Paris', 'London']` → `[[1, 2], []]` (the first row has no `city`, so it is ignored, but rows 1 and 2 keep their original positions) - `rows = []`, `index_column = 'name'`, `lookup_values = ['Ada']` → `[[]]` ### Constraints - `0 <= len(rows) <= 100000` - `0 <= len(lookup_values) <= 100000` - Indexed values are hashable. - Ignore rows missing the indexed column.

Constraints

0 <= len(rows) <= 100000
0 <= len(lookup_values) <= 100000
Indexed values are hashable
Ignore rows missing the indexed column

Examples

Input: ([{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}, {'id': '3', 'name': 'Ada'}], 'name', ['Ada', 'Charles', 'Eve'])

Expected Output: [[0, 2], [1], []]

Explanation: The index groups all row positions for each name.

Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}, {'id': 4, 'age': 40}], 'age', [40, 36])

Expected Output: [[3], [0, 2]]

Explanation: Multiple rows can share the same indexed value.

Input: ([{'id': 1}, {'id': 2, 'city': 'Paris'}, {'id': 3, 'city': 'Paris'}], 'city', ['Paris', 'London'])

Expected Output: [[1, 2], []]

Explanation: Rows missing the indexed column are ignored.

Input: ([], 'name', ['Ada'])

Expected Output: [[]]

Explanation: An empty table yields no matches for any lookup.

Hints

Build the mapping from value to row positions once, then answer each lookup in O(1) average time.
If a value appears multiple times, store all matching positions, not just one.

Implement in-memory DB querying

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Part 1: Project Selected Columns from an In-Memory Table

Constraints

0 <= len(rows) <= 10000
0 <= len(selected_columns) <= 100
Each row is a dictionary with string keys
Do not modify the input rows in place

Examples

Input: ([{'id': '1', 'name': 'Ada', 'birthday': '1815-12-10'}, {'id': '2', 'name': 'Charles', 'birthday': '1791-12-26'}], ['id', 'name'])

Expected Output: [{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}]

Explanation: Keep only the id and name columns from each row.

Input: ([{'id': '1', 'name': 'Ada'}], ['name'])

Expected Output: [{'name': 'Ada'}]

Explanation: A single-row table should still project correctly.

Input: ([], ['id'])

Expected Output: []

Explanation: Projecting an empty table returns an empty table.

Input: ([{'id': '1'}, {'name': 'Ada'}], ['id', 'name'])

Expected Output: [{'id': '1', 'name': None}, {'id': None, 'name': 'Ada'}]

Explanation: Missing requested columns should appear with value None.

Input: ([{'id': '1'}, {'id': '2'}], [])

Expected Output: [{}, {}]

Explanation: Selecting zero columns returns one empty row per input row.

Hints

Build a fresh output row for each input row instead of deleting keys from the original.
A dictionary get lookup is useful when a selected column may be missing.

Part 2: Filter Rows with Simple WHERE Conditions

Constraints

0 <= len(rows) <= 10000
0 <= len(conditions) <= 20
Each operator is one of '=', '!=', '<', '<=', '>', '>='
Values compared within a condition are mutually comparable

Examples

Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}], [('age', '=', 36)])

Expected Output: [{'id': 1, 'age': 36}, {'id': 3, 'age': 36}]

Explanation: Only rows with age equal to 36 should remain.

Input: ([{'id': 1, 'age': 36, 'name': 'Ada'}, {'id': 2, 'age': 28, 'name': 'Bob'}, {'id': 3, 'age': 40, 'name': 'Ada'}], [('name', '=', 'Ada'), ('age', '>', 36)])

Expected Output: [{'id': 3, 'age': 40, 'name': 'Ada'}]

Explanation: A row must satisfy both conditions.

Input: ([{'id': '1', 'birthday': '1815-12-10'}, {'id': '2', 'birthday': '1791-12-26'}], [('birthday', '<', '1800-01-01')])

Expected Output: [{'id': '2', 'birthday': '1791-12-26'}]

Explanation: ISO date strings can be compared lexicographically.

Input: ([{'id': 1}, {'id': 2}], [])

Expected Output: [{'id': 1}, {'id': 2}]

Explanation: No WHERE conditions means every row matches.

Input: ([{'id': 1}, {'id': 2, 'name': 'Ada'}], [('name', '=', 'Ada')])

Expected Output: [{'id': 2, 'name': 'Ada'}]

Explanation: Rows missing the filtered column do not match.

Hints

Write a helper that checks whether one row satisfies all conditions.
An empty list of conditions should match every row.

Part 3: Sort Rows with ORDER BY on Multiple Columns

Constraints

0 <= len(rows) <= 10000
0 <= len(order_columns) <= 10
len(order_columns) == len(ascending_flags)
Every order column exists in every row and contains comparable values

Examples

Input: ([{'id': 1, 'name': 'Charles'}, {'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}], ['name'], [True])

Expected Output: [{'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}, {'id': 1, 'name': 'Charles'}]

Explanation: Sort by name ascending.

Input: ([{'id': 1, 'age': 30}, {'id': 2, 'age': 20}, {'id': 3, 'age': 40}], ['age'], [False])

Expected Output: [{'id': 3, 'age': 40}, {'id': 1, 'age': 30}, {'id': 2, 'age': 20}]

Explanation: Sort by age descending.

Expected Output: [{'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 1, 'age': 30, 'name': 'Bob'}]

Explanation: First sort by age ascending, then name ascending, then id descending to break ties.

Input: ([], ['name'], [True])

Expected Output: []

Explanation: Sorting an empty table returns an empty table.

Input: ([{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}], [], [])

Expected Output: [{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}]

Explanation: No ORDER BY columns means keep the original order.

Hints

Python's sort is stable, so sorting from the last key to the first key is a clean way to handle mixed directions.
Make a copy of the rows before sorting if you do not want to mutate the input.

Part 4: Build a Simple Equality Index for Fast Lookups

Constraints

0 <= len(rows) <= 100000
0 <= len(lookup_values) <= 100000
Indexed values are hashable
Ignore rows missing the indexed column

Examples

Input: ([{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}, {'id': '3', 'name': 'Ada'}], 'name', ['Ada', 'Charles', 'Eve'])

Expected Output: [[0, 2], [1], []]

Explanation: The index groups all row positions for each name.

Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}, {'id': 4, 'age': 40}], 'age', [40, 36])

Expected Output: [[3], [0, 2]]

Explanation: Multiple rows can share the same indexed value.

Input: ([{'id': 1}, {'id': 2, 'city': 'Paris'}, {'id': 3, 'city': 'Paris'}], 'city', ['Paris', 'London'])

Expected Output: [[1, 2], []]

Explanation: Rows missing the indexed column are ignored.

Input: ([], 'name', ['Ada'])

Expected Output: [[]]

Explanation: An empty table yields no matches for any lookup.

Hints

Build the mapping from value to row positions once, then answer each lookup in O(1) average time.
If a value appears multiple times, store all matching positions, not just one.

Quick Overview

Quick Overview