Build and Validate a Binary Tree from Parent-Child Pairs
Company: Optiver
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
## Build and Validate a Binary Tree from Parent-Child Pairs
You are given a single input string that is supposed to describe a binary tree as a sequence of parent-child pairs, for example:
```
(A,B) (B,C) (A,D)
```
Each pair has the form `(P,C)`, meaning node `P` is the parent of node `C`. Node values are single uppercase English letters `A`–`Z`. Your task is to validate the input and, if it describes a valid binary tree, print the tree's S-expression. If the input is invalid, print the highest-priority error code instead.
### Input format (strict)
A *well-formed* input string consists of one or more pairs, where:
- Each pair is exactly `(` + one uppercase letter + `,` + one uppercase letter + `)` — no spaces inside a pair.
- Consecutive pairs are separated by exactly one space.
- There is no leading or trailing whitespace, and no other characters appear anywhere.
The input may be **any** arbitrary string, so it must be checked carefully against this format. Any deviation from the format above (wrong characters, lowercase letters, multi-character node names, missing parentheses or commas, extra/missing spaces, empty string, etc.) is a format error.
### Errors
After format validation, check the described graph. Report exactly one error code — if several errors are present simultaneously, report the one with the highest priority. `E1` is the highest priority and `E5` is the lowest:
- **E1 — Invalid Input String**: the input string does not match the strict format described above.
- **E2 — Duplicate Pair**: the exact same pair `(P,C)` appears more than once.
- **E3 — Parent Has More than 2 Children**: some node appears as the parent in more than two (distinct) pairs.
- **E4 — Multiple Roots**: more than one node has no parent.
- **E5 — Cycle In The Tree**: the edges do not form a valid tree because some node can be reached in more than one way — there is a directed cycle (including a self-pair such as `(A,A)`), or some node has more than one parent.
If any error applies, output only its code, e.g.:
```
E3
```
### Output for valid input
If the input passes all checks, it describes a single binary tree. Print its S-expression, defined recursively as:
```
S(node) = "(" + node value + S(first child) + S(second child) + ")"
```
where a missing child contributes the empty string, and the children of each node are ordered so that the smaller letter (lexicographically) comes first. There are no spaces in the output.
### Examples
**Example 1**
```
Input: (A,B) (B,C) (A,D)
Output: (A(B(C))(D))
```
`A` is the root with children `B` and `D` (`B` printed first); `B` has one child `C`.
**Example 2**
```
Input: (A,B) (A,C) (B,D) (D,C)
Output: E5
```
`C` has two parents (`A` and `D`), so the structure is not a valid tree.
**Example 3**
```
Input: (A,B) (A,C) (B,D) (A,E)
Output: E3
```
`A` has three children.
**Example 4**
```
Input: (A,B) (b,C)
Output: E1
```
Lowercase `b` violates the input format.
**Example 5**
```
Input: (A,B) (A,B) (B,C)
Output: E2
```
The pair `(A,B)` appears twice.
### Constraints
- The input string length is at most $10^4$ characters.
- Node values are limited to the 26 uppercase letters, so a valid tree has at most 26 nodes.
- Your solution must return the error checks in strict priority order: `E1 > E2 > E3 > E4 > E5`.
Quick Answer: This question evaluates understanding of tree and graph concepts, strict string parsing and validation, duplicate and cycle detection, and the construction of canonical S-expressions within the Coding & Algorithms category.
You are given a single input string that is supposed to describe a binary tree as a sequence of parent-child pairs, for example `(A,B) (B,C) (A,D)`. Each pair `(P,C)` means node `P` is the parent of node `C`; node values are single uppercase letters `A`-`Z`. Return the tree's S-expression if the input is a valid binary tree, otherwise return the highest-priority error code.
Strict input format: one or more pairs, each exactly `(` + uppercase letter + `,` + uppercase letter + `)`, consecutive pairs separated by exactly one space, no leading/trailing whitespace, and no other characters anywhere. Any deviation is a format error.
Report exactly one error code, in strict priority order E1 > E2 > E3 > E4 > E5:
- E1 - Invalid Input String: the input does not match the strict format.
- E2 - Duplicate Pair: the exact same pair `(P,C)` appears more than once.
- E3 - Parent Has More than 2 Children: some node is the parent in more than two distinct pairs.
- E4 - Multiple Roots: more than one node has no parent.
- E5 - Cycle In The Tree: some node has more than one parent, or the edges contain a directed cycle (including a self-pair such as `(A,A)`).
For valid input, return the S-expression S(node) = "(" + node value + S(first child) + S(second child) + ")", where a missing child contributes the empty string and children are ordered so the smaller letter comes first. There are no spaces in the output. Example: `(A,B) (B,C) (A,D)` -> `(A(B(C))(D))`.
Constraints
- The input string length is at most 10^4 characters.
- Node values are limited to the 26 uppercase letters A-Z, so a valid tree has at most 26 nodes.
- The input may be any arbitrary string and must be validated strictly against the format.
- Exactly one output is produced: either the S-expression or a single error code.
- Errors are reported in strict priority order E1 > E2 > E3 > E4 > E5.
Examples
Input: ('(A,B) (B,C) (A,D)',)
Expected Output: '(A(B(C))(D))'
Explanation: Valid tree: root A has children B and D (B first, smaller letter); B has child C.
Input: ('(A,B) (A,C) (B,D) (D,C)',)
Expected Output: 'E5'
Explanation: C has two parents (A and D), so the graph is not a valid tree.
Input: ('(A,B) (A,C) (B,D) (A,E)',)
Expected Output: 'E3'
Explanation: A appears as parent of B, C and E - more than two children.
Input: ('(A,B) (b,C)',)
Expected Output: 'E1'
Explanation: Lowercase 'b' violates the strict input format.
Input: ('(A,B) (A,B) (B,C)',)
Expected Output: 'E2'
Explanation: The pair (A,B) appears twice - a duplicate pair.
Input: ('(A,B) (C,D)',)
Expected Output: 'E4'
Explanation: Two disjoint trees: both A and C have no parent, so there are multiple roots.
Input: ('(A,A)',)
Expected Output: 'E5'
Explanation: A self-pair (A,A) is a directed cycle.
Input: ('(A,B) (B,A)',)
Expected Output: 'E5'
Explanation: A->B->A is a directed cycle; no node has in-degree 0.
Input: ('(A,B)',)
Expected Output: '(A(B))'
Explanation: Single edge: root A with one child B.
Input: ('',)
Expected Output: 'E1'
Explanation: Empty string has no pairs - invalid format.
Input: ('(A,B) (B,C) (C,D) (D,E) (E,B)',)
Expected Output: 'E5'
Explanation: B has parents A and E - more than one parent (and a cycle B->C->D->E->B).
Input: ('(A,B) (A,C)',)
Expected Output: '(A(B)(C))'
Explanation: Root A has exactly two children B and C, printed in lexicographic order.
Hints
- Validate the whole string first with a strict pattern (one or more `(X,Y)` pairs separated by single spaces) before parsing anything - any mismatch is E1.
- Process the checks strictly in priority order and return on the first that applies: duplicate pair (E2), then out-degree > 2 (E3), then count of parent-less nodes (E4), then multi-parent or cycle (E5).
- A node has no parent when its in-degree is 0. More than one such node means multiple roots (E4); zero such nodes means every node has an incoming edge, which forces a cycle (E5).
- With exactly one root and every in-degree <= 1, a directed cycle would leave some nodes unreachable from the root - traverse from the root and compare the visited count to the total node count.
- To print the S-expression, sort each node's children so the smaller letter is emitted first, then recurse: `"(" + value + build(child1) + build(child2) + ")"`.