Implement an in-memory SQL-like table
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Onsite
Quick Answer: This question evaluates implementation skills for an in-memory SQL-like table, focusing on data modeling with nested key-value structures, string handling, efficient lookups, and lexicographic sorting with tie-breaking.
Constraints
- 0 <= len(commands) <= 2 * 10^5
- rowKey, colKey, whereCol, whereValue, orderByCol, and value are strings without spaces
- All comparisons and sorting are lexicographic string operations
- A SET command overwrites the previous value of the same cell, if any
Examples
Input: ["SET r1 name bob", "SET r1 age 2", "SET r2 name bob", "SET r2 age 10", "GET r1 name", "SELECT name bob age"]
Expected Output: ["bob", "r2 r1"]
Explanation: GET returns the stored value "bob". For SELECT, both rows match name=bob, and they are sorted by age as strings: "10" comes before "2" lexicographically, so r2 appears before r1.
Input: ["GET missing name", "SELECT status active score"]
Expected Output: ["NULL", ""]
Explanation: The table is empty. GET returns NULL, and SELECT has no matching rows so it returns an empty string.
Input: ["SET r1 city seattle", "SET r2 city seattle", "SET r2 score 9", "SELECT city seattle score", "SET r1 score 10", "SET r2 city boston", "SELECT city seattle score", "GET r2 city"]
Expected Output: ["r1 r2", "r1", "boston"]
Explanation: In the first SELECT, both rows match city=seattle. r1 is missing score, so its sort value is "", which comes before "9". After updating r2.city to boston, only r1 still matches city=seattle. The final GET returns boston.
Input: ["SET b role dev", "SET a role dev", "SET b rank mid", "SET a rank mid", "SELECT role dev rank"]
Expected Output: ["a b"]
Explanation: Both rows match role=dev and both have the same rank value "mid", so the tie is broken by rowKey ascending: a before b.
Input: ["SET r1 color red", "SET r1 color blue", "GET r1 color", "SELECT color red color", "SELECT color blue color"]
Expected Output: ["blue", "", "r1"]
Explanation: The second SET overwrites the old value red with blue. GET returns blue, SELECT for red finds no rows, and SELECT for blue returns r1.
Input: []
Expected Output: []
Explanation: With no commands, there are no outputs.
Solution
def solution(commands):
table = {}
index = {}
outputs = []
for command in commands:
parts = command.split()
op = parts[0]
if op == "SET":
row_key, col_key, value = parts[1], parts[2], parts[3]
row = table.setdefault(row_key, {})
if col_key in row:
old_value = row[col_key]
if old_value != value:
bucket = index[(col_key, old_value)]
bucket.remove(row_key)
if not bucket:
del index[(col_key, old_value)]
row[col_key] = value
index.setdefault((col_key, value), set()).add(row_key)
else:
row[col_key] = value
index.setdefault((col_key, value), set()).add(row_key)
elif op == "GET":
row_key, col_key = parts[1], parts[2]
value = table.get(row_key, {}).get(col_key)
outputs.append(value if value is not None else "NULL")
elif op == "SELECT":
where_col, where_value, order_by_col = parts[1], parts[2], parts[3]
matches = index.get((where_col, where_value))
if not matches:
outputs.append("")
else:
ordered_rows = sorted(
matches,
key=lambda row_key: (table[row_key].get(order_by_col, ""), row_key)
)
outputs.append(" ".join(ordered_rows))
return outputsTime complexity: SET/GET: O(1) average time. SELECT: O(k log k), where k is the number of rows matching (whereCol, whereValue).. Space complexity: O(c), where c is the number of currently stored cells, counting both the table storage and the exact-match index..
Hints
- Use a nested dictionary for the actual table: rowKey -> {colKey: value}.
- To avoid scanning every row for SELECT, keep an index from (colKey, value) to the set of rowKeys that currently match that exact value.