How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at Retell.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Retell during technical interviews.

Design and Implement a Mini SQL Query Engine

Q: Design and Implement a Mini SQL Query Engine

This question evaluates a candidate's ability to design and build a small SQL query engine, covering tokenization, parsing, and query execution over in-memory tables. It tests system design and compiler-style thinking, including how to scope a large problem into an extensible v1 and reason about extending it with sorting, aggregation, and joins. This is a system design and practical implementation question assessing both architectural judgment and coding ability.

You're given a set of in-memory tables — each table is a collection of rows over a known, typed schema (e.g., a table is a list of row objects, and each column has a type such as integer, string, or null). Design and implement a small SQL query engine: it takes a SQL query string, executes it against the in-memory tables, and returns the result set.

Start from a basic subset of SQL and be ready to extend it as the interviewer adds features. You cannot implement all of SQL in an interview window, so an explicit part of this exercise is scoping: decide which subset to support first, state your boundaries out loud, confirm them with the interviewer, and design so that new clauses slot in cleanly rather than forcing a rewrite.

Constraints & Assumptions

Tables are fully in memory (e.g., tables[name] = [ {col: value, ...}, ... ] ) with a known schema (column names + types). Data fits in memory.
Read-only queries ( SELECT ); no transactions, no persistence, no writes in the base version.
v1 target: SELECT <columns> FROM <table> WHERE <predicate> , where the predicate supports comparison operators ( = , != , < , <= , > , >= ) combined with AND / OR .
Extensions to be ready for: ORDER BY + LIMIT , aggregate functions ( COUNT , SUM , AVG , MIN , MAX ) with GROUP BY , and INNER JOIN of two tables.
Keywords may be treated case-insensitively; assume a simplified, well-formed grammar unless validation is explicitly requested.

Clarifying Questions to Ask

Which subset is in scope for v1, and how far do we extend — joins, grouping, subqueries? Where should I stop?
What column types exist, and how are NULL s handled in comparisons and in aggregates (SQL three-valued logic, or a simplified rule)?
Should I validate the query against the schema (error on unknown table/column), or assume valid input?
Is the input always a single statement? Can I assume a simplified grammar (no nested expressions in the select list to start)?
What is the expected output shape — an ordered list of rows with a defined column order — and how are ORDER BY ties broken?

Part 1 — Architecture: how the engine is structured

Before writing code, lay out the engine's stages and the boundaries between them. What are the components, what does each one consume and produce, and where would future features (JOIN, GROUP BY) plug in?

What This Part Should Cover Premium

Part 2 — Implement the core: SELECT … FROM … WHERE

Implement the base engine: parse and execute SELECT col1, col2 FROM table WHERE <predicate> over the in-memory tables, supporting the comparison operators and AND / OR in the predicate. Show both the parsing and the execution.

What This Part Should Cover Premium

Part 3 — Extend: ORDER BY, aggregation/GROUP BY, and JOIN

The interviewer keeps adding features. Extend the engine to support ORDER BY + LIMIT, aggregate functions with GROUP BY, and an INNER JOIN of two tables. Explain how each maps onto your operator model, and where you would set boundaries given limited time.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

How would you support a JOIN of three or more tables, and how would join order affect performance?
How would you add HAVING and subqueries (e.g., WHERE col IN (SELECT ...) ) to your AST and executor?
Where would a query optimizer fit, and what is one rewrite (e.g., predicate pushdown) that would help?
How would you test this engine so you stay confident it's correct as you keep adding features?

Constraints & Assumptions

Tables are fully in memory (e.g., tables[name] = [ {col: value, ...}, ... ] ) with a known schema (column names + types). Data fits in memory.
Read-only queries ( SELECT ); no transactions, no persistence, no writes in the base version.
v1 target: SELECT <columns> FROM <table> WHERE <predicate> , where the predicate supports comparison operators ( = , != , < , <= , > , >= ) combined with AND / OR .
Extensions to be ready for: ORDER BY + LIMIT , aggregate functions ( COUNT , SUM , AVG , MIN , MAX ) with GROUP BY , and INNER JOIN of two tables.
Keywords may be treated case-insensitively; assume a simplified, well-formed grammar unless validation is explicitly requested.

Clarifying Questions to Ask

Which subset is in scope for v1, and how far do we extend — joins, grouping, subqueries? Where should I stop?
What column types exist, and how are NULL s handled in comparisons and in aggregates (SQL three-valued logic, or a simplified rule)?
Should I validate the query against the schema (error on unknown table/column), or assume valid input?
Is the input always a single statement? Can I assume a simplified grammar (no nested expressions in the select list to start)?
What is the expected output shape — an ordered list of rows with a defined column order — and how are ORDER BY ties broken?

Part 1 — Architecture: how the engine is structured

What This Part Should Cover Premium

Part 2 — Implement the core: SELECT … FROM … WHERE

What This Part Should Cover Premium

Part 3 — Extend: ORDER BY, aggregation/GROUP BY, and JOIN

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

How would you support a JOIN of three or more tables, and how would join order affect performance?
How would you add HAVING and subqueries (e.g., WHERE col IN (SELECT ...) ) to your AST and executor?
Where would a query optimizer fit, and what is one rewrite (e.g., predicate pushdown) that would help?
How would you test this engine so you stay confident it's correct as you keep adding features?

Design and Implement a Mini SQL Query Engine

Quick Overview

Design and Implement a Mini SQL Query Engine

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Architecture: how the engine is structured

What This Part Should Cover Premium

Part 2 — Implement the core: SELECT … FROM … WHERE

What This Part Should Cover Premium

Part 3 — Extend: ORDER BY, aggregation/GROUP BY, and JOIN

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP

Design and Implement a Mini SQL Query Engine

Quick Overview

Design and Implement a Mini SQL Query Engine

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Architecture: how the engine is structured

What This Part Should Cover Premium

Part 2 — Implement the core: SELECT … FROM … WHERE

What This Part Should Cover Premium

Part 3 — Extend: ORDER BY, aggregation/GROUP BY, and JOIN

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP