Implement a JSON Parser
Company: Snowflake
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
## Implement a JSON Parser
Write a parser that takes a single string containing one JSON value and returns the corresponding in-memory data structure. You must implement the parsing yourself — do **not** call a built-in JSON library (`json.loads`, `JSON.parse`, `Gson`, etc.).
Your parser must support the full JSON grammar:
- **Objects** — `{ "key": value, ... }`, where keys are strings and values are any JSON value. The empty object `{}` is valid.
- **Arrays** — `[ value, value, ... ]`, where each element is any JSON value. The empty array `[]` is valid.
- **Strings** — double-quoted, with escape sequences: `\"`, `\\`, `\/`, `\b`, `\f`, `\n`, `\r`, `\t`, and 4-hex-digit Unicode escapes `\uXXXX`.
- **Numbers** — integers and floating point, including an optional leading minus sign, a fractional part, and an exponent (e.g. `0`, `-12`, `3.14`, `1e10`, `-2.5E-3`). JSON does not allow leading zeros (e.g. `01`) or a leading `+`.
- **Literals** — `true`, `false`, and `null`.
- **Whitespace** — spaces, tabs, newlines, and carriage returns may appear between any tokens and must be ignored.
Map the parsed result to the host language's natural types (for example, in Python: object → `dict`, array → `list`, string → `str`, number → `int`/`float`, `true`/`false` → `bool`, `null` → `None`).
### Validation
The input is **not** guaranteed to be valid JSON. If the input is malformed, your parser must reject it by raising/throwing an error (rather than returning a partial or incorrect result). Cases that must be rejected include, but are not limited to:
- Trailing commas: `[1, 2, ]` or `{"a": 1,}`.
- Missing commas, colons, or brackets: `{"a" 1}`, `[1 2]`, `{"a": 1`.
- Unquoted object keys: `{a: 1}`.
- Unterminated strings or invalid escape sequences.
- Numbers with leading zeros (`012`), a leading `+`, or a bare `.` / trailing `.` (`.5`, `5.`).
- Extra non-whitespace characters after a complete, valid JSON value: `{"a": 1} garbage`.
- Empty input or input that is only whitespace.
### Examples
Input:
```
{"name": "Ada", "age": 36, "langs": ["Pascal", "Ada"], "retired": false, "spouse": null}
```
Output (as a structured value):
```
{ name: "Ada", age: 36, langs: ["Pascal", "Ada"], retired: false, spouse: null }
```
Input:
```
[1, -2.5, 3e2, true, null]
```
Output:
```
[1, -2.5, 300.0, true, null]
```
Input:
```
{"a": 1,}
```
Output: an error (trailing comma is invalid).
Input:
```
[1 2]
```
Output: an error (missing comma between elements).
### Constraints
- The input is a single string of up to roughly $10^5$ characters.
- Nesting (objects/arrays inside objects/arrays) may be up to a few hundred levels deep; aim for $O(n)$ time in the input length.
- A clean, idiomatic implementation is expected: a tokenizer/scanner plus a recursive-descent (or equivalent) parser, with clear handling of each value type and explicit error reporting on malformed input.
Quick Answer: This question tests practical knowledge of parsing theory, including lexical analysis and recursive-descent parsing applied to a well-defined grammar. It evaluates a software engineer's ability to implement low-level string processing, handle edge cases in formal grammars, and produce robust error handling — skills commonly assessed in coding interviews at the senior level.
Write a function that takes a single string containing one JSON value and returns the corresponding in-memory data structure, **without** calling any built-in JSON library (`json.loads`, `JSON.parse`, `Gson`, etc.). Implement a tokenizer plus a recursive-descent parser.
Your parser must support the full JSON grammar:
- **Objects** — `{ "key": value, ... }` (keys are strings; `{}` is valid).
- **Arrays** — `[ value, value, ... ]` (`[]` is valid).
- **Strings** — double-quoted, with escapes `\"`, `\\`, `\/`, `\b`, `\f`, `\n`, `\r`, `\t`, and 4-hex-digit `\uXXXX`.
- **Numbers** — optional leading `-`, integer part (no leading zeros, no leading `+`), optional fraction, optional exponent (`0`, `-12`, `3.14`, `1e10`, `-2.5E-3`).
- **Literals** — `true`, `false`, `null`.
- **Whitespace** — spaces, tabs, newlines, carriage returns between tokens are ignored.
Map to natural types (object→map/dict, array→list, string→str, number→int/float, `true`/`false`→bool, `null`→null/None).
### Validation
Input is **not** guaranteed valid. On malformed input the parser must **not** return a partial result.
**For this console, signal an error by returning the sentinel string `"<<PARSE_ERROR>>"`** (instead of throwing), so the result is directly comparable. Internally you should still detect the error precisely — e.g. raise an exception in your parser and convert it to the sentinel at the top level.
Cases that must be rejected (→ `"<<PARSE_ERROR>>"`) include: trailing commas (`[1, 2, ]`, `{"a": 1,}`); missing commas/colons/brackets (`{"a" 1}`, `[1 2]`, `{"a": 1`); unquoted keys (`{a: 1}`); unterminated strings or invalid escapes; numbers with leading zeros (`012`), a leading `+`, or a bare/trailing dot (`.5`, `5.`); extra non-whitespace after a complete value (`{"a": 1} garbage`); and empty / whitespace-only input.
### Examples
- `solution('[1, -2.5, 3e2, true, null]')` → `[1, -2.5, 300.0, True, None]`
- `solution('{"a": 1,}')` → `'<<PARSE_ERROR>>'` (trailing comma)
- `solution('[1 2]')` → `'<<PARSE_ERROR>>'` (missing comma)
### Constraints
- Input is a single string up to ~10^5 characters.
- Nesting may be a few hundred levels deep; aim for O(n) time in the input length.
Constraints
- Input is a single string up to ~10^5 characters.
- Nesting may be a few hundred levels deep; aim for O(n) time.
- Do not call a built-in JSON library (json.loads / JSON.parse / Gson, etc.).
- On malformed input, return the sentinel string "<<PARSE_ERROR>>" rather than a partial result.
- Numbers reject leading zeros, a leading '+', and a bare/trailing dot; JSON allows only a leading '-'.
Examples
Input: ('{"name": "Ada", "age": 36, "langs": ["Pascal", "Ada"], "retired": false, "spouse": null}',)
Expected Output: {'name': 'Ada', 'age': 36, 'langs': ['Pascal', 'Ada'], 'retired': False, 'spouse': None}
Explanation: A full object with string, int, nested array, bool, and null values parses to the natural Python types.
Input: ('[1, -2.5, 3e2, true, null]',)
Expected Output: [1, -2.5, 300.0, True, None]
Explanation: Mixed array: integer stays int, fraction and exponent become floats (3e2 → 300.0), literals map to True/None.
Input: ('{"a": 1,}',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: Trailing comma after the last object member is invalid JSON, so the parser returns the error sentinel.
Input: ('[1 2]',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: Missing comma between array elements — after parsing 1 the next non-space char is neither ',' nor ']'.
Input: (' {} ',)
Expected Output: {}
Explanation: Surrounding whitespace is ignored and the empty object is valid.
Input: ('[]',)
Expected Output: []
Explanation: The empty array is valid.
Input: ('"a\\nb\\t\\u0041"',)
Expected Output: 'a\nb\tA'
Explanation: String escapes: \n and \t become control characters and \u0041 decodes to 'A'.
Input: ('012',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: Leading zeros are not allowed in JSON numbers; '012' is rejected.
Input: ('5.',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: A trailing dot with no fractional digits is invalid.
Input: (' ',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: Whitespace-only (or empty) input contains no JSON value and is rejected.
Input: ('{"a": 1} garbage',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: Extra non-whitespace characters after a complete value make the input malformed.
Input: ('-0.5E-3',)
Expected Output: -0.0005
Explanation: Negative number with a fraction and a negative exponent parses to a float.
Input: ('{"nested": {"x": [true, {"y": null}], "z": -7}}',)
Expected Output: {'nested': {'x': [True, {'y': None}], 'z': -7}}
Explanation: Deeply nested objects and arrays parse recursively into nested dicts/lists.
Input: ('{a: 1}',)
Expected Output: '<<PARSE_ERROR>>'
Explanation: Object keys must be double-quoted strings; an unquoted key is rejected.
Hints
- Use a single position cursor over the string and a set of mutually-recursive helpers: parse_value, parse_object, parse_array, parse_string, parse_number. Skip whitespace before each value and around structural characters.
- parse_value dispatches on the first non-whitespace character: '{' → object, '[' → array, '"' → string, '-' or a digit → number, and the keywords true/false/null. Anything else is an error.
- For numbers, validate the grammar yourself: optional '-', then either a single '0' or a non-zero digit followed by digits (this rejects leading zeros like 012), then an optional '.' with at least one digit, then an optional exponent. Reject a trailing dot like '5.'.
- After parsing the top-level value, skip trailing whitespace and require the cursor to be at the end of the string — otherwise input like '{"a": 1} garbage' is malformed.
- Detect errors precisely by raising an internal exception wherever the grammar is violated, then catch it at the top level and return "<<PARSE_ERROR>>". Also catch RecursionError for pathologically deep input.