Implement an In-Memory Query Engine
Company: Valon
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in data modeling, API design, algorithmic filtering, and implementation of an in-memory query execution engine, including membership predicates, general filter logic, and handling edge cases.
Part 1: Basic Select from an In-Memory Table
Constraints
- 1 <= number of tables <= 100
- 0 <= number of rows in a table <= 10^4
- Each table's 'columns' list contains unique names
- Row values are primitive Python values such as int, str, bool, or None
- If any selected column is not present in the table schema, the query is invalid
Examples
Input: ({'users': {'columns': ['id', 'name', 'age'], 'rows': [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 25}]}}, 'users', ['name', 'age'])
Expected Output: {'rows': [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]}
Explanation: Basic projection: keep only the name and age fields from every row.
Input: ({'users': {'columns': ['id', 'name'], 'rows': []}}, 'users', ['name'])
Expected Output: {'rows': []}
Explanation: Edge case: selecting from an empty table should return an empty result set.
Input: ({}, 'orders', ['id'])
Expected Output: {'error': "table 'orders' not found"}
Explanation: If the table does not exist, return an error.
Input: ({'users': {'columns': ['id', 'name'], 'rows': [{'id': 1, 'name': 'Alice'}]}}, 'users', ['name', 'email'])
Expected Output: {'error': "unknown columns: ['email']"}
Explanation: Selecting a column not present in the schema is invalid.
Hints
- Validate the table name and all requested columns before scanning rows.
- Build a brand-new dictionary for each row so the output contains only the requested fields.
Part 2: Select with Membership Filtering
Constraints
- 1 <= number of tables <= 100
- 0 <= number of rows in a table <= 10^4
- 0 <= number of allowed values <= 10^4
- The filter column and all selected columns must exist in the table schema
- Membership checking should be efficient for large allowed-value collections
Examples
Input: ({'users': {'columns': ['id', 'name', 'age'], 'rows': [{'id': 1, 'name': 'Alice', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 25}, {'id': 3, 'name': 'Carol', 'age': 40}]}}, 'users', ['id', 'name'], 'name', ['Alice', 'Bob'])
Expected Output: {'rows': [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]}
Explanation: Only rows whose name is Alice or Bob should be returned.
Input: ({'users': {'columns': ['id', 'name'], 'rows': [{'id': 1, 'name': 'Alice'}]}}, 'users', ['name'], 'name', [])
Expected Output: {'rows': []}
Explanation: Edge case: an empty allowed-value collection means no row can match.
Input: ({'users': {'columns': ['id', 'name'], 'rows': [{'id': 1, 'name': 'Alice'}]}}, 'users', ['name'], 'city', ['Seattle'])
Expected Output: {'error': "unknown filter column: 'city'"}
Explanation: The filter column must exist in the table schema.
Input: ({}, 'users', ['name'], 'name', ['Alice'])
Expected Output: {'error': "table 'users' not found"}
Explanation: Missing table should return an error.
Hints
- Convert the allowed-values collection into a set before scanning rows.
- Filter rows first, then project only the requested columns for the rows that match.
Part 3: Select with General Filtering
Constraints
- 1 <= number of tables <= 100
- 0 <= number of rows in a table <= 10^4
- 0 <= number of filters <= 20
- Supported operators are '==', '!=', '>', '>=', '<', '<=', and 'in'
- For valid ordered comparisons, the filter value type is compatible with the column values being compared
Examples
Input: ({'users': {'columns': ['id', 'name', 'age', 'city'], 'rows': [{'id': 1, 'name': 'Alice', 'age': 31, 'city': 'Seattle'}, {'id': 2, 'name': 'Bob', 'age': 28, 'city': 'Seattle'}, {'id': 3, 'name': 'Carol', 'age': 35, 'city': 'Portland'}, {'id': 4, 'name': 'Dave', 'age': 40, 'city': 'Seattle'}]}}, 'users', ['name'], [{'column': 'age', 'op': '>', 'value': 30}, {'column': 'city', 'op': '==', 'value': 'Seattle'}])
Expected Output: {'rows': [{'name': 'Alice'}, {'name': 'Dave'}]}
Explanation: Both filters must pass, so only users older than 30 and living in Seattle remain.
Input: ({'users': {'columns': ['id', 'name', 'age', 'city'], 'rows': [{'id': 1, 'name': 'Alice', 'age': 31, 'city': 'Seattle'}, {'id': 2, 'name': 'Bob', 'age': 28, 'city': 'Seattle'}, {'id': 3, 'name': 'Carol', 'age': 35, 'city': 'Portland'}, {'id': 4, 'name': 'Dave', 'age': 40, 'city': 'Seattle'}]}}, 'users', ['id'], [])
Expected Output: {'rows': [{'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]}
Explanation: Edge case: no filters means all rows are returned after projection.
Input: ({'users': {'columns': ['id', 'name', 'age'], 'rows': [{'id': 1, 'name': 'Alice', 'age': 31}, {'id': 2, 'name': 'Bob', 'age': 28}]}}, 'users', ['name'], [{'column': 'age', 'op': '>', 'value': 100}])
Expected Output: {'rows': []}
Explanation: If no row satisfies the filters, return an empty result set.
Input: ({'users': {'columns': ['id', 'name'], 'rows': [{'id': 1, 'name': 'Alice'}]}}, 'users', ['name'], [{'column': 'country', 'op': '==', 'value': 'US'}])
Expected Output: {'error': "unknown filter column: 'country'"}
Explanation: Every filter column must exist in the table schema.
Input: ({'users': {'columns': ['id', 'name'], 'rows': [{'id': 1, 'name': 'Alice'}]}}, 'users', ['name'], [{'column': 'name', 'op': 'contains', 'value': 'A'}])
Expected Output: {'error': "unsupported operator: 'contains'"}
Explanation: Unsupported filter operators should be rejected.
Hints
- Normalize and validate all filters before scanning the rows.
- Evaluate filters with short-circuit logic: as soon as one filter fails for a row, skip the remaining filters.