Implement toy-language types and generic substitution
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
## Problem: Toy Language Type System (Printing + Generic Resolution)
You are implementing a small type system for a custom “Toy Language”. Types can be:
- **Primitive types:** lowercase strings like `int`, `float`, `str`, `bool`
- **Generic type variables:** uppercase letter(s) followed by digits, e.g., `T1`, `T2`
- **Tuples:** bracketed, comma-separated types, which may be nested, e.g. `[int,[T1,str]]`
- **Function signatures:** a list of parameter types and a single return type, written as `(param1,param2,...) -> returnType`
### Data structures
Implement the following classes.
#### `Node`
Represents a type node.
- Construction:
- If `node_type` is a **string**, the node is a **leaf** representing either a primitive type or a generic.
- If `node_type` is a **list of Node**, the node is a **tuple** whose children are those nodes.
```python
class Node:
def __init__(self, node_type):
...
def to_str(self) -> str:
...
```
**`Node.to_str()` formatting rules**
- Leaf: return the exact string name, e.g. `"int"`, `"T1"`.
- Tuple: return bracketed, comma-separated children with **no spaces**, e.g. `[int,T1]`, `[int,[T1,str]]`.
#### `Function`
Represents a function signature.
```python
class Function:
def __init__(self, parameters: list[Node], output_type: Node):
...
def to_str(self) -> str:
...
```
**`Function.to_str()` formatting rules**
- Format: `(param1,param2,...) -> returnType` using each node’s `to_str()`.
- Example: `(int,T1) -> [T1,str]`
---
## Part 2: Infer concrete return type
Implement:
```python
def get_return_type(parameters: list[Node], function: Function) -> Node:
...
```
Given:
- `function.parameters`: the expected parameter type patterns (may include generics and nested tuples)
- `parameters`: the actual argument types supplied at a call site
Return a **new `Node`** representing the **concrete** return type after substituting generics.
### Requirements
1. **Argument count check**
- Raise an error if `len(parameters) != len(function.parameters)`.
2. **Build a substitution table (generic resolution)**
- Compare each expected parameter pattern against the corresponding actual argument type.
- If the expected node is a **generic** (e.g., `T1`), bind it to the actual type node.
- If the expected node is a **primitive**, it must match the actual primitive exactly.
- If the expected node is a **tuple**, the actual must also be a tuple of the same arity, and you must recurse.
3. **Conflict detection**
- If a generic like `T1` is bound more than once in the same call, all bindings must be structurally identical.
- Otherwise raise an error (e.g., `T1` cannot be both `int` and `str`).
4. **Return type substitution**
- After building the mapping, recursively replace all generics appearing in `function.output_type` with their bound concrete types.
- If a generic appears in the output type but was never bound by the inputs, raise an error (or clearly document and implement a consistent behavior).
### Examples
1. **Basic substitution**
- Function: `[T1, T2, int, T1] -> [T1, T2]`
- Arguments: `[int, str, int, int]`
- Mapping: `T1 = int`, `T2 = str`
- Result: `[int, str]`
2. **Nested tuples & complex generics**
- Function: `[[T1, float], T2, T3] -> [T3, T1]`
- Arguments: `[[str, float], [int, str], int]`
- Mapping: `T1 = str`, `T2 = [int, str]`, `T3 = int`
- Result: `[int, str]`
3. **Conflict error**
- Function: `[T1, T1] -> T1`
- Arguments: `[int, str]`
- Error: `T1` cannot be both `int` and `str`
Implement the above behavior with clear recursive logic and appropriate error handling.
Quick Answer: This question evaluates understanding of type systems, AST-style type representations, and generic type variable resolution by requiring implementation of Node and Function models and substitution of generics to infer concrete return types.
Part 1: Format Toy-Language Types and Function Signatures
Implement `solution(kind, data)` to produce the **canonical printed form** of a serialized Toy-Language type, in one of two modes selected by `kind`.
## Type expressions
A **type expression** is one of:
- A **leaf**: a non-empty string such as `'int'`, `'float'`, or `'T1'`.
- A **tuple type**: a Python `list` of type expressions (which may themselves be leaves or nested tuples).
A tuple type prints as its children joined by commas inside square brackets, with **no spaces** — and the brackets are always present, even for a single-element list.
Examples:
| Input expression | Prints as |
|---|---|
| `'int'` | `int` |
| `['int', 'T1']` | `[int,T1]` |
| `['T9']` | `[T9]` |
| `['int', ['T1', 'str']]` | `[int,[T1,str]]` |
## What to implement
`solution(kind, data)` returns a **string**. `kind` is either `'node'` or `'function'`.
### `kind == 'node'`
`data` is a single **type expression**. Format and return it using the rules above.
- A leaf returns its stored string exactly.
- A tuple returns `[` + its comma-separated formatted children + `]`, recursing into each child.
### `kind == 'function'`
`data` is a pair `(parameters, output_type)`, where:
- `parameters` is a **list** of type expressions (the parameter types), and
- `output_type` is a single type expression (the return type).
Format the full signature as:
```
(param1,param2,...) -> returnType
```
- The parameter list contains the formatted parameter types joined by commas, wrapped in parentheses, with **no spaces** inside.
- The return type is the formatted `output_type`.
- The **only** spaces in the output are the single spaces around `->`.
- A function may have **zero** parameters, which prints as `() -> returnType`.
Examples:
| Input | Prints as |
|---|---|
| `(['int', 'T1'], ['T1', 'str'])` | `(int,T1) -> [T1,str]` |
| `([], 'bool')` | `() -> bool` |
## Constraints
- `kind` is either `'node'` or `'function'`.
- Each leaf is a non-empty string.
- Tuple types are represented only by Python lists.
- `0 <= number of function parameters <= 10^3`.
- The total number of nodes across the input structure is at most `10^4`.
- The maximum nesting depth is at most `200`.
Constraints
- kind is either 'node' or 'function'
- Each leaf is a non-empty string
- Tuple types are represented only by Python lists
- 0 <= number of function parameters <= 10^3
- Total number of nodes across the input structure is at most 10^4
- Maximum nesting depth is at most 200
Examples
Input: ('node', 'int')
Expected Output: 'int'
Explanation: A leaf node prints as its exact stored name.
Input: ('node', ['int', ['T1', 'str']])
Expected Output: '[int,[T1,str]]'
Explanation: Nested tuple types are formatted recursively with no spaces.
Input: ('function', (['int', 'T1'], ['T1', 'str']))
Expected Output: '(int,T1) -> [T1,str]'
Explanation: Each parameter and the return type are formatted using the same node rules.
Input: ('function', ([], 'bool'))
Expected Output: '() -> bool'
Explanation: Edge case: a function may have zero parameters.
Input: ('node', ['T9'])
Expected Output: '[T9]'
Explanation: Edge case: a single-element tuple still uses brackets.
Hints
- Use one recursive helper to format a type expression: strings return themselves, lists format each child and join with commas.
- For a function signature, reuse the same type-formatting helper for every parameter and for the return type.
Part 2: Resolve Generic Return Types in Toy-Language
Infer the concrete return type of a generic Toy-Language function call by matching its parameter patterns against the actual argument types, then substituting the inferred bindings into the declared return type.
## Type expressions
A **type expression** is serialized in one of two ways:
- A **string leaf**, e.g. `'int'`, `'float'`, `'str'`, `'bool'`, or `'T1'`.
- A **list of type expressions**, representing a **tuple type**, e.g. `['int', 'str']`.
Two kinds of string leaves exist:
- **Primitive types** — lowercase strings (`'int'`, `'str'`, …).
- **Generic type variables** — strings whose first character is uppercase (`'T1'`, `'T2'`, …).
## What to implement
```python
def solution(actual_parameters, expected_parameters, output_type):
...
```
- **`actual_parameters`** — a list of type expressions giving the concrete type of each actual argument (no generics).
- **`expected_parameters`** — a list of type expressions giving the declared pattern for each parameter (may contain generics).
- **`output_type`** — a single type expression for the declared return type (may contain generics).
Return the **inferred return type** as a canonical string. If the call is invalid for any reason described below, return the string `'ERROR'`.
## Resolution rules
1. **Arity.** If `len(actual_parameters)` does not equal `len(expected_parameters)`, the call is invalid → `'ERROR'`.
2. **Matching.** Compare each expected parameter pattern against the corresponding actual type, position by position:
- **Expected generic** → bind that generic to the actual type.
- **Expected primitive** → the actual type must be the *exact same* primitive string.
- **Expected tuple** (list) → the actual type must also be a tuple (list) of the **same length**, and each child is matched recursively by the same rules.
- Any other mismatch (e.g. expected primitive vs. actual tuple, or differing tuple lengths) makes the call invalid → `'ERROR'`.
3. **Consistent bindings.** If the same generic variable is matched more than once, every binding must be **structurally identical**. Otherwise the call is invalid → `'ERROR'`.
4. **Substitution.** After all parameters match, rebuild `output_type` by recursively replacing every generic with its bound type, descending into tuples.
5. **Unbound generics.** If `output_type` contains a generic that was never bound during matching, the call is invalid → `'ERROR'`.
## Output format
Serialize the resolved return type to a **canonical string**:
- A primitive leaf is its own name, e.g. `'int'`.
- A tuple is written as its children joined by commas inside square brackets, with **no spaces**, e.g. `'[int,str]'`. Tuples may nest, e.g. `'[int,[str,bool]]'`.
## Examples
- `solution(['int', 'str', 'int', 'int'], ['T1', 'T2', 'int', 'T1'], ['T1', 'T2'])` → `'[int,str]'`
(`T1`→`int` consistently, `T2`→`str`; the literal `int` parameter matches.)
- `solution([['str', 'float'], ['int', 'str'], 'int'], [['T1', 'float'], 'T2', 'T3'], ['T3', 'T1'])` → `'[int,str]'`
(`T1`→`str`, `T2`→`['int','str']`, `T3`→`int`; output is `[T3, T1]` = `[int, str]`.)
- `solution(['int', 'str'], ['T1', 'T1'], 'T1')` → `'ERROR'`
(`T1` is bound to `int` then `str` — conflicting bindings.)
- `solution([], [], 'bool')` → `'bool'`
(No parameters; the return type is already concrete.)
- `solution(['int'], ['int'], 'T1')` → `'ERROR'`
(`T1` appears in the output but was never bound.)
Constraints
- 0 <= len(actual_parameters), len(expected_parameters) <= 10^3
- Tuple types are represented only by Python lists
- Total number of nodes across all input type expressions is at most 10^4
- Maximum nesting depth is at most 200
- Primitive type names are lowercase strings
- Generic names start with an uppercase letter
Examples
Input: (['int', 'str', 'int', 'int'], ['T1', 'T2', 'int', 'T1'], ['T1', 'T2'])
Expected Output: '[int,str]'
Explanation: T1 resolves to int and T2 resolves to str.
Input: ([['str', 'float'], ['int', 'str'], 'int'], [['T1', 'float'], 'T2', 'T3'], ['T3', 'T1'])
Expected Output: '[int,str]'
Explanation: Nested tuples are matched recursively before substituting the output.
Input: (['int', 'str'], ['T1', 'T1'], 'T1')
Expected Output: 'ERROR'
Explanation: T1 cannot be bound to both int and str in the same call.
Input: ([], [], 'bool')
Expected Output: 'bool'
Explanation: Edge case: a zero-parameter function with a concrete return type is valid.
Input: (['int'], ['int'], 'T1')
Expected Output: 'ERROR'
Explanation: The output generic T1 was never bound by the inputs.
Hints
- Walk expected and actual types together recursively, building a dictionary from generic name to concrete subtree.
- After matching the inputs, run a second recursive pass over the output type to substitute every generic.