Optimize a SQL query plan tree
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
## Problem
You are given an in-memory representation of a SQL query plan as a tree of nodes (a relational algebra plan). Each node is one of:
- `Scan(table)`
- `Filter(predicate, child)`
- `Project(columns, child)`
- `Join(join_type, condition, left, right)`
Assume predicates are boolean expressions over columns, and `Join` condition is an equi-join or conjunction of expressions.
### Task
Implement an **optimizer** function that takes the root of a query plan tree and returns a **semantically equivalent** plan that is “better” according to simple heuristics.
You should implement at least two classic rule-based optimizations, such as:
- **Predicate pushdown**: move `Filter` nodes as close to `Scan` as possible; split conjunctions and push parts down each side of a `Join` when the predicate only references columns from that side.
- **Projection pruning**: remove unused columns early by pushing `Project` down and trimming `Scan` outputs.
### Requirements
- The optimized plan must preserve query semantics.
- The output plan should be a valid tree in the same node format.
- Your code should handle nested combinations of `Filter`, `Project`, and `Join`.
### Notes
- Assume schemas for tables are known (columns per table).
- Predicates may reference columns from one side of a join or both sides.
- Keep the problem language-agnostic; provide working logic rather than just describing rules.
Quick Answer: This question evaluates understanding of relational algebra, query optimization techniques, and tree-based plan transformations such as predicate pushdown and projection pruning.