Implement type AST and infer generics
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to design an abstract syntax tree representation and implement generic type inference for a small language, exercising skills in type systems, structural recursion, and detecting conflicting bindings.
Part 1: Build a Type AST
Constraints
- 1 <= len(type_expr) <= 10^4
- Valid primitive names are char, int, and float
- Generic names are T followed by one or more digits
- Maximum nesting depth <= 10^3
Examples
Input: "int"
Expected Output: ("primitive", "int")
Explanation: A primitive becomes a primitive node.
Input: "T20"
Expected Output: ("generic", "T20")
Explanation: A generic type variable becomes a generic node.
Input: "(int,T1,float)"
Expected Output: ("tuple", [("primitive", "int"), ("generic", "T1"), ("primitive", "float")])
Explanation: A flat tuple stores its children in order.
Input: "(int,(T1,char),float)"
Expected Output: ("tuple", [("primitive", "int"), ("tuple", [("generic", "T1"), ("primitive", "char")]), ("primitive", "float")])
Explanation: Nested tuples become nested tuple nodes.
Input: " ( int , ( T2 , char ) ) "
Expected Output: ("tuple", [("primitive", "int"), ("tuple", [("generic", "T2"), ("primitive", "char")])])
Explanation: Whitespace is ignored.
Hints
- A recursive-descent parser fits this grammar well: parse one type, and if you see '(', recursively parse comma-separated child types until ')'.
- Tokenize lazily: for non-tuple types, read one contiguous alphanumeric token and then decide whether it is a primitive or a generic.
Part 2: Serialize a Type AST to Canonical String
Constraints
- The total number of AST nodes is at most 10^4
- Tuple nodes contain at least 2 children
- The AST is valid and uses only the node kinds primitive, generic, and tuple
Examples
Input: ("primitive", "char")
Expected Output: "char"
Explanation: A primitive leaf serializes to its name.
Input: ("generic", "T3")
Expected Output: "T3"
Explanation: A generic leaf also serializes to its name.
Input: ("tuple", [("primitive", "int"), ("generic", "T1")])
Expected Output: "(int,T1)"
Explanation: Children are joined with commas and wrapped in parentheses.
Input: ("tuple", [("primitive", "int"), ("tuple", [("generic", "T1"), ("primitive", "char")]), ("primitive", "float")])
Expected Output: "(int,(T1,char),float)"
Explanation: Nested tuples serialize recursively.
Hints
- This is a tree traversal problem: serialize each child first, then combine them for tuple nodes.
- Primitive and generic nodes are both leaves, so their string form is just their stored name.
Part 3: Infer Generic Bindings from a Pattern Type
Constraints
- 1 <= len(pattern), len(concrete) <= 10^4
- Valid primitive names are char, int, and float
- Generics appear only in pattern; if concrete contains a generic, return "ERROR"
- Maximum nesting depth <= 10^3
Examples
Input: ("(T1,int)", "(char,int)")
Expected Output: {"T1": "char"}
Explanation: T1 must be char for the tuple to match.
Input: ("(T1,(T2,char))", "(float,(int,char))")
Expected Output: {"T1": "float", "T2": "int"}
Explanation: Bindings are inferred recursively inside nested tuples.
Input: ("(int,float)", "(int,float)")
Expected Output: {}
Explanation: No generics appear, so the mapping is empty.
Input: ("(T1,T1)", "(int,float)")
Expected Output: "ERROR"
Explanation: T1 cannot be both int and float.
Input: ("(int,float)", "int")
Expected Output: "ERROR"
Explanation: A tuple cannot match a non-tuple.
Hints
- Parse both strings into trees first. Working with structure is much easier than comparing raw text.
- When you see a generic in the pattern, either bind it for the first time or check that the new candidate matches its previous binding exactly.