You are given either
(a) a raw XML-like string such as <catalog><book><author>Gambardella, Matthew</author></book></catalog> or
(b) its tokenized form as a list of dictionaries like [{'text': 'catalog', 'token_type': 'open_tag'}, {'text': 'book', 'token_type': 'open_tag'}, {'text': 'author', 'token_type': 'open_tag'}, {'text': 'Gambardella, Matthew', 'token_type': 'raw_text'}, {'text': 'author', 'token_type': 'close_tag'}, {'text': 'book', 'token_type': 'close_tag'}, {'text': 'catalog', 'token_type': 'close_tag'}]. Implement:
-
tokenize(xml_str) -> list[dict] that emits tokens where token_type ∈ {open_tag, close_tag, raw_text};
-
class XMLParser with
init
(tokens: list[dict]) that validates the structure and raises an exception for malformed input;
-
to_string()/
str
() that reconstructs the original XML;
-
add_element(path: list[str], tag: str, text: str|null, index: int|null) to insert a new element under the node identified by path;
-
remove_element(path: list[str]) to delete a node;
-
traverse_iterative() that performs an iterative DFS (no recursion) and yields nodes in preorder. Constraints and requirements: use a linear scan with a stack for validation; overall validation should be O
(n) time and O
(h) space where h is tree height; tags have no attributes and text may contain any characters except '<' and '>'; handle edge cases such as mismatched or out-of-order closing tags, unclosed tags at EOF, empty token lists, and extraneous raw_text between sibling tags. Describe your data structures, algorithms, and time/space complexity for each method.