Implement XML tokenizer and parser with operations
Company: Snapchat
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
You are given either
(a) a raw XML-like string such as <catalog><book><author>Gambardella, Matthew</author></book></catalog> or
(b) its tokenized form as a list of dictionaries like [{'text': 'catalog', 'token_type': 'open_tag'}, {'text': 'book', 'token_type': 'open_tag'}, {'text': 'author', 'token_type': 'open_tag'}, {'text': 'Gambardella, Matthew', 'token_type': 'raw_text'}, {'text': 'author', 'token_type': 'close_tag'}, {'text': 'book', 'token_type': 'close_tag'}, {'text': 'catalog', 'token_type': 'close_tag'}]. Implement:
1) tokenize(xml_str) -> list[dict] that emits tokens where token_type ∈ {open_tag, close_tag, raw_text};
2) class XMLParser with __init__(tokens: list[dict]) that validates the structure and raises an exception for malformed input;
3) to_string()/__str__() that reconstructs the original XML;
4) add_element(path: list[str], tag: str, text: str|null, index: int|null) to insert a new element under the node identified by path;
5) remove_element(path: list[str]) to delete a node;
6) traverse_iterative() that performs an iterative DFS (no recursion) and yields nodes in preorder. Constraints and requirements: use a linear scan with a stack for validation; overall validation should be O
(n) time and O
(h) space where h is tree height; tags have no attributes and text may contain any characters except '<' and '>'; handle edge cases such as mismatched or out-of-order closing tags, unclosed tags at EOF, empty token lists, and extraneous raw_text between sibling tags. Describe your data structures, algorithms, and time/space complexity for each method.
Quick Answer: This question evaluates XML tokenization, hierarchical tree construction and validation, stack-based parsing, iterative DFS traversal, and API-level manipulation of tree nodes, assessing data-structure design, parsing correctness, and algorithmic complexity reasoning.