PracHub
QuestionsPremiumLearningGuidesCheatsheetNEW

Quick Overview

This question evaluates a candidate's competency in data structures, algorithms, and systems-level thinking for implementing query processing in an in-memory database, including projection, predicate filtering, ordering, and index design.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Implement in-memory DB querying

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Question Implement an in-memory database that supports: 1. Querying the whole table and returning only selected columns (projection). 2. Adding WHERE clause filtering with simple conditions like (column, operator, value). 3. Adding ORDER BY on one or more columns with ascending/descending control. 4. Explaining how you would design and build an index to accelerate such queries (no code required). Example public API: db = DB() db.insert("users", {"id": "1", "name": "Ada", "birthday": "1815-12-10"}) … db.query("users", ["id"], conditions=[("name", "=", "Charles")], order_by=(["birthday"], False)) # returns sorted projection

Quick Answer: This question evaluates a candidate's competency in data structures, algorithms, and systems-level thinking for implementing query processing in an in-memory database, including projection, predicate filtering, ordering, and index design.

Part 1: Project Selected Columns from an In-Memory Table

You are given an in-memory table represented as a list of row dictionaries. Write a function that performs projection: return a new table containing only the selected columns for every row. Preserve the original row order. If a requested column does not exist in a row, include it with value None.

Constraints

  • 0 <= len(rows) <= 10000
  • 0 <= len(selected_columns) <= 100
  • Each row is a dictionary with string keys
  • Do not modify the input rows in place

Examples

Input: ([{'id': '1', 'name': 'Ada', 'birthday': '1815-12-10'}, {'id': '2', 'name': 'Charles', 'birthday': '1791-12-26'}], ['id', 'name'])

Expected Output: [{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}]

Explanation: Keep only the id and name columns from each row.

Input: ([{'id': '1', 'name': 'Ada'}], ['name'])

Expected Output: [{'name': 'Ada'}]

Explanation: A single-row table should still project correctly.

Input: ([], ['id'])

Expected Output: []

Explanation: Projecting an empty table returns an empty table.

Input: ([{'id': '1'}, {'name': 'Ada'}], ['id', 'name'])

Expected Output: [{'id': '1', 'name': None}, {'id': None, 'name': 'Ada'}]

Explanation: Missing requested columns should appear with value None.

Input: ([{'id': '1'}, {'id': '2'}], [])

Expected Output: [{}, {}]

Explanation: Selecting zero columns returns one empty row per input row.

Solution

def solution(rows, selected_columns):
    result = []
    for row in rows:
        projected = {}
        for column in selected_columns:
            projected[column] = row.get(column, None)
        result.append(projected)
    return result

Time complexity: O(n * k), where n is the number of rows and k is the number of selected columns. Space complexity: O(n * k) for the output table.

Hints

  1. Build a fresh output row for each input row instead of deleting keys from the original.
  2. A dictionary get lookup is useful when a selected column may be missing.

Part 2: Filter Rows with Simple WHERE Conditions

You are given a table as a list of row dictionaries and a list of simple WHERE conditions. Each condition is a tuple of the form (column, operator, value). Return all rows that satisfy every condition. Supported operators are '=', '!=', '<', '<=', '>', and '>='. Preserve the original row order. If a row is missing a referenced column, treat it as not matching.

Constraints

  • 0 <= len(rows) <= 10000
  • 0 <= len(conditions) <= 20
  • Each operator is one of '=', '!=', '<', '<=', '>', '>='
  • Values compared within a condition are mutually comparable

Examples

Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}], [('age', '=', 36)])

Expected Output: [{'id': 1, 'age': 36}, {'id': 3, 'age': 36}]

Explanation: Only rows with age equal to 36 should remain.

Input: ([{'id': 1, 'age': 36, 'name': 'Ada'}, {'id': 2, 'age': 28, 'name': 'Bob'}, {'id': 3, 'age': 40, 'name': 'Ada'}], [('name', '=', 'Ada'), ('age', '>', 36)])

Expected Output: [{'id': 3, 'age': 40, 'name': 'Ada'}]

Explanation: A row must satisfy both conditions.

Input: ([{'id': '1', 'birthday': '1815-12-10'}, {'id': '2', 'birthday': '1791-12-26'}], [('birthday', '<', '1800-01-01')])

Expected Output: [{'id': '2', 'birthday': '1791-12-26'}]

Explanation: ISO date strings can be compared lexicographically.

Input: ([{'id': 1}, {'id': 2}], [])

Expected Output: [{'id': 1}, {'id': 2}]

Explanation: No WHERE conditions means every row matches.

Input: ([{'id': 1}, {'id': 2, 'name': 'Ada'}], [('name', '=', 'Ada')])

Expected Output: [{'id': 2, 'name': 'Ada'}]

Explanation: Rows missing the filtered column do not match.

Solution

def solution(rows, conditions):
    def matches(row):
        for column, operator, value in conditions:
            if column not in row:
                return False
            current = row[column]
            if operator == '=':
                ok = current == value
            elif operator == '!=':
                ok = current != value
            elif operator == '<':
                ok = current < value
            elif operator == '<=':
                ok = current <= value
            elif operator == '>':
                ok = current > value
            elif operator == '>=':
                ok = current >= value
            else:
                raise ValueError('Unsupported operator: ' + str(operator))
            if not ok:
                return False
        return True

    return [dict(row) for row in rows if matches(row)]

Time complexity: O(n * m), where n is the number of rows and m is the number of conditions. Space complexity: O(r), where r is the number of matching rows returned.

Hints

  1. Write a helper that checks whether one row satisfies all conditions.
  2. An empty list of conditions should match every row.

Part 3: Sort Rows with ORDER BY on Multiple Columns

You are given a table as a list of row dictionaries. Sort the rows using ORDER BY rules on one or more columns. The columns are given in priority order, and each column has its own ascending or descending flag. Return the sorted rows without modifying the originals. If no order columns are provided, return the rows in their original order.

Constraints

  • 0 <= len(rows) <= 10000
  • 0 <= len(order_columns) <= 10
  • len(order_columns) == len(ascending_flags)
  • Every order column exists in every row and contains comparable values

Examples

Input: ([{'id': 1, 'name': 'Charles'}, {'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}], ['name'], [True])

Expected Output: [{'id': 2, 'name': 'Ada'}, {'id': 3, 'name': 'Bob'}, {'id': 1, 'name': 'Charles'}]

Explanation: Sort by name ascending.

Input: ([{'id': 1, 'age': 30}, {'id': 2, 'age': 20}, {'id': 3, 'age': 40}], ['age'], [False])

Expected Output: [{'id': 3, 'age': 40}, {'id': 1, 'age': 30}, {'id': 2, 'age': 20}]

Explanation: Sort by age descending.

Input: ([{'id': 1, 'age': 30, 'name': 'Bob'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}], ['age', 'name', 'id'], [True, True, False])

Expected Output: [{'id': 3, 'age': 25, 'name': 'Zoe'}, {'id': 4, 'age': 30, 'name': 'Ada'}, {'id': 2, 'age': 30, 'name': 'Ada'}, {'id': 1, 'age': 30, 'name': 'Bob'}]

Explanation: First sort by age ascending, then name ascending, then id descending to break ties.

Input: ([], ['name'], [True])

Expected Output: []

Explanation: Sorting an empty table returns an empty table.

Input: ([{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}], [], [])

Expected Output: [{'id': 1, 'name': 'Ada'}, {'id': 2, 'name': 'Bob'}]

Explanation: No ORDER BY columns means keep the original order.

Solution

def solution(rows, order_columns, ascending_flags):
    if len(order_columns) != len(ascending_flags):
        raise ValueError('order_columns and ascending_flags must have the same length')

    result = [dict(row) for row in rows]
    for column, ascending in reversed(list(zip(order_columns, ascending_flags))):
        result.sort(key=lambda row: row[column], reverse=not ascending)
    return result

Time complexity: O(k * n log n), where n is the number of rows and k is the number of ORDER BY columns. Space complexity: O(n) for the copied output rows.

Hints

  1. Python's sort is stable, so sorting from the last key to the first key is a clean way to handle mixed directions.
  2. Make a copy of the rows before sorting if you do not want to mutate the input.

Part 4: Build a Simple Equality Index for Fast Lookups

This coding version focuses on a simplified index design. In many in-memory databases, repeated equality filters on one column can be accelerated with a hash index. Given table rows, an index column, and lookup values, build an index from column value to row positions, then return the matching row positions for each lookup value. Ignore rows that do not contain the indexed column.

Constraints

  • 0 <= len(rows) <= 100000
  • 0 <= len(lookup_values) <= 100000
  • Indexed values are hashable
  • Ignore rows missing the indexed column

Examples

Input: ([{'id': '1', 'name': 'Ada'}, {'id': '2', 'name': 'Charles'}, {'id': '3', 'name': 'Ada'}], 'name', ['Ada', 'Charles', 'Eve'])

Expected Output: [[0, 2], [1], []]

Explanation: The index groups all row positions for each name.

Input: ([{'id': 1, 'age': 36}, {'id': 2, 'age': 28}, {'id': 3, 'age': 36}, {'id': 4, 'age': 40}], 'age', [40, 36])

Expected Output: [[3], [0, 2]]

Explanation: Multiple rows can share the same indexed value.

Input: ([{'id': 1}, {'id': 2, 'city': 'Paris'}, {'id': 3, 'city': 'Paris'}], 'city', ['Paris', 'London'])

Expected Output: [[1, 2], []]

Explanation: Rows missing the indexed column are ignored.

Input: ([], 'name', ['Ada'])

Expected Output: [[]]

Explanation: An empty table yields no matches for any lookup.

Solution

def solution(rows, index_column, lookup_values):
    index = {}
    for i, row in enumerate(rows):
        if index_column in row:
            key = row[index_column]
            index.setdefault(key, []).append(i)

    return [list(index.get(value, [])) for value in lookup_values]

Time complexity: O(n + q + total_output_size) on average, where n is the number of rows and q is the number of lookups. Space complexity: O(n) for the index.

Hints

  1. Build the mapping from value to row positions once, then answer each lookup in O(1) average time.
  2. If a value appears multiple times, store all matching positions, not just one.
Last updated: Apr 25, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)
  • Build a Compose Rating Card - OpenAI (medium)
  • Generate Data Labeling Schedules - OpenAI (medium)
  • Convert IPv4 Ranges to CIDR Blocks - OpenAI (medium)