Implement XML tokenizer and parser with operations

Q: Implement XML tokenizer and parser with operations

This question evaluates XML tokenization, hierarchical tree construction and validation, stack-based parsing, iterative DFS traversal, and API-level manipulation of tree nodes, assessing data-structure design, parsing correctness, and algorithmic complexity reasoning.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

You are given either (a) a raw XML-like string such as <catalog><book><author>Gambardella, Matthew</author></book></catalog> or (b) its tokenized form as a list of dictionaries like [{'text': 'catalog', 'token_type': 'open_tag'}, {'text': 'book', 'token_type': 'open_tag'}, {'text': 'author', 'token_type': 'open_tag'}, {'text': 'Gambardella, Matthew', 'token_type': 'raw_text'}, {'text': 'author', 'token_type': 'close_tag'}, {'text': 'book', 'token_type': 'close_tag'}, {'text': 'catalog', 'token_type': 'close_tag'}]. Implement:

tokenize(xml_str) -> list[dict] that emits tokens where token_type ∈ {open_tag, close_tag, raw_text};
class XMLParser with init (tokens: list[dict]) that validates the structure and raises an exception for malformed input;
to_string()/ str () that reconstructs the original XML;
add_element(path: list[str], tag: str, text: str|null, index: int|null) to insert a new element under the node identified by path;
remove_element(path: list[str]) to delete a node;
traverse_iterative() that performs an iterative DFS (no recursion) and yields nodes in preorder. Constraints and requirements: use a linear scan with a stack for validation; overall validation should be O (n) time and O (h) space where h is tree height; tags have no attributes and text may contain any characters except '<' and '>'; handle edge cases such as mismatched or out-of-order closing tags, unclosed tags at EOF, empty token lists, and extraneous raw_text between sibling tags. Describe your data structures, algorithms, and time/space complexity for each method.

Implement XML tokenizer and parser with operations

Overview

Comments (0)