Validate a Binary Tree Built from Directed Edges
Company: Optiver
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
# Validate a Binary Tree Built from Directed Edges
You are given a list of directed edges. Each edge is a pair `(parent, child)` of positive-integer node labels, meaning `parent` points to `child`. The set of nodes is exactly the set of labels that appear in at least one edge.
Determine whether the edges form a single **valid binary tree**:
- Exactly one root (a node with no parent).
- Every non-root node has exactly one parent.
- No node has more than two children.
- There are no cycles, and every node is reachable from the root (equivalently, the edges form one connected tree, not a forest).
If the edges do **not** form a valid binary tree, output the **first** error code that applies from the priority list below (check in this exact order and report only the highest-priority error):
1. `INVALID_INPUT` — some edge is a self-loop (`parent == child`), or the exact same `(parent, child)` edge appears more than once.
2. `TOO_MANY_CHILDREN` — some node appears as the parent in more than two edges.
3. `MULTIPLE_PARENTS` — some node appears as the child in more than one edge.
4. `CYCLE` — the edges contain a directed cycle.
5. `MULTIPLE_ROOTS` — more than one node has no parent (the edges form a forest of two or more trees).
If the edges **do** form a valid binary tree, output its **in-order traversal serialization**: visit the left subtree, then the node, then the right subtree, and print the node labels separated by single spaces.
## Left/right child rule
Edges are given in a fixed order. For any parent node:
- The **first** edge listed with that node as parent attaches its **left** child.
- The **second** edge listed with that node as parent attaches its **right** child.
- A node with only one child edge has only a **left** child.
## Input
- `edges`: a list of pairs `[parent, child]` of positive integers, in a fixed given order.
## Output
Return a single string:
- one of the five error codes above (exactly as spelled, uppercase) if the edges do not form a valid binary tree, or
- the space-separated in-order traversal of the tree if they do.
## Examples
**Example 1 — valid tree**
```
edges = [[1,2],[1,3],[2,4],[3,5]]
```
Node `1` has left child `2` and right child `3`; node `2` has left child `4`; node `3` has left child `5`.
Output: `4 2 1 5 3`
**Example 2 — cycle**
```
edges = [[1,2],[2,3],[3,1]]
```
Every node has exactly one parent and at most two children, but the edges form a directed cycle.
Output: `CYCLE`
**Example 3 — too many children**
```
edges = [[1,2],[1,3],[1,4]]
```
Node `1` has three children.
Output: `TOO_MANY_CHILDREN`
**Example 4 — multiple roots**
```
edges = [[1,2],[3,4]]
```
Nodes `1` and `3` both have no parent, so the edges form a forest.
Output: `MULTIPLE_ROOTS`
**Example 5 — duplicate edge**
```
edges = [[1,2],[1,2]]
```
The edge `(1,2)` appears twice.
Output: `INVALID_INPUT`
**Example 6 — multiple parents**
```
edges = [[1,2],[1,3],[2,4],[3,4]]
```
Node `4` has two parents (`2` and `3`).
Output: `MULTIPLE_PARENTS`
## Constraints
- `1 <= number of edges <= 10^4`
- `1 <= node label <= 10^5`
- Node labels within the tree are distinct identifiers; the same label never denotes two different nodes.
- The error priority order must be respected: when several problems exist simultaneously, report only the highest-priority one.
Quick Answer: This question evaluates understanding of tree and graph fundamentals including parent-child constraints, duplicate-edge detection, cycle detection, node-degree limits, and in-order traversal semantics.
You are given a list of directed edges, each a pair [parent, child] of positive-integer node labels, in a fixed given order. The set of nodes is exactly the labels that appear in at least one edge. Determine whether the edges form a single valid binary tree: exactly one root (a node with no parent), every non-root node has exactly one parent, no node has more than two children, no cycles, and every node is reachable from the root (one connected tree, not a forest).
If the edges do NOT form a valid binary tree, return the FIRST error code that applies, checked in this exact priority order (report only the highest-priority one):
1. INVALID_INPUT - some edge is a self-loop (parent == child), or the exact same (parent, child) edge appears more than once.
2. TOO_MANY_CHILDREN - some node appears as the parent in more than two edges.
3. MULTIPLE_PARENTS - some node appears as the child in more than one edge.
4. CYCLE - the edges contain a directed cycle.
5. MULTIPLE_ROOTS - more than one node has no parent (the edges form a forest).
If the edges DO form a valid binary tree, return its in-order traversal serialization: left subtree, node, right subtree, with labels separated by single spaces.
Left/right child rule: edges are given in a fixed order. For any parent, the FIRST edge listed with that node as parent attaches its LEFT child; the SECOND attaches its RIGHT child; a node with only one child edge has only a left child.
Constraints
- 1 <= number of edges <= 10^4
- 1 <= node label <= 10^5
- Node labels are distinct identifiers; the same label never denotes two different nodes.
- When several problems exist simultaneously, report only the highest-priority error code.
- Edge order is significant: it determines left- vs right-child assignment.
Examples
Input: ([[1,2],[1,3],[2,4],[3,5]],)
Expected Output: '4 2 1 5 3'
Explanation: Valid tree: 1 -> left 2, right 3; 2 -> left 4; 3 -> left 5. In-order = 4 2 1 5 3.
Input: ([[1,2],[2,3],[3,1]],)
Expected Output: 'CYCLE'
Explanation: Every node has one parent and <=2 children, but 1->2->3->1 forms a directed cycle.
Input: ([[1,2],[1,3],[1,4]],)
Expected Output: 'TOO_MANY_CHILDREN'
Explanation: Node 1 is the parent of three edges (2,3,4), exceeding two children.
Input: ([[1,2],[3,4]],)
Expected Output: 'MULTIPLE_ROOTS'
Explanation: Nodes 1 and 3 both have no parent, so the edges form a forest of two trees.
Input: ([[1,2],[1,2]],)
Expected Output: 'INVALID_INPUT'
Explanation: The exact edge (1,2) appears twice, which is invalid input (highest priority).
Input: ([[1,2],[1,3],[2,4],[3,4]],)
Expected Output: 'MULTIPLE_PARENTS'
Explanation: Node 4 appears as a child of both 2 and 3, so it has multiple parents.
Input: ([[1,2]],)
Expected Output: '2 1'
Explanation: Single edge: root 1 with only a left child 2. In-order = 2 1.
Input: ([[1,1]],)
Expected Output: 'INVALID_INPUT'
Explanation: Self-loop (1,1) is invalid input regardless of any other issue.
Input: ([[1,2],[1,3],[1,4],[5,4]],)
Expected Output: 'TOO_MANY_CHILDREN'
Explanation: Priority test: node 1 has three children (TOO_MANY_CHILDREN) outranks node 4's two parents.
Input: ([[10,20],[10,30],[20,40],[20,50],[30,60]],)
Expected Output: '40 20 50 10 60 30'
Explanation: Valid tree with two full levels; verifies left-then-right edge ordering in the traversal.
Hints
- Run the five checks in the exact priority order given; return on the first one that fires so tie-breaks resolve correctly.
- INVALID_INPUT covers two cases: parent == child (self-loop) and an exact (parent, child) pair seen more than once. Use a set of (parent, child) tuples.
- Count how many edges each node parents (>2 => TOO_MANY_CHILDREN) and how many times each node is a child (>1 => MULTIPLE_PARENTS).
- For CYCLE, a directed DFS with white/gray/black coloring detects a back edge to a gray node. Do it iteratively to stay within recursion limits at 10^4 nodes.
- The root is the unique node that never appears as a child. After the earlier checks pass, more than one such node means MULTIPLE_ROOTS.
- Store children in edge order: index 0 is the left child, index 1 the right child, then run an iterative in-order traversal.