Implement Markdown-to-HTML parser
Company: Samsara
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates parsing and string-processing skills, including tokenization, stateful scanning, inline versus block-level markup handling, edge-case management (e.g., unmatched delimiters, trailing spaces, empty paragraphs) and reasoning about time and space complexity.
Constraints
- 0 <= len(text) <= 200000
- The parser only needs to support paragraphs, soft line breaks, blockquotes, and strikethrough as described.
- A blockquote marker is recognized only when a line starts with the exact prefix `> `.
- Strikethrough markers `~~` are paired left to right within the same block only; unmatched final `~~` is treated as literal text.
Examples
Input: ("This is a paragraph with a\nsoft line break.\n\n> Some quoted text\n> continues here\n\nThis has a ~~strikethrough~~ word.",)
Expected Output: "<p>This is a paragraph with a<br/>soft line break.</p><blockquote>Some quoted text<br/>continues here</blockquote><p>This has a <del>strikethrough</del> word.</p>"
Explanation: Two paragraph lines become one `<p>` with `<br/>`, the quoted lines become one `<blockquote>`, and the paired `~~` becomes `<del>`.
Input: ("5 < 6 & 7 > 3\n>not a quote\n~~open only",)
Expected Output: "<p>5 < 6 & 7 > 3<br/>>not a quote<br/>~~open only</p>"
Explanation: The second line is not a blockquote because it starts with `>` but not `> `. The final unmatched `~~` is kept literally.
Input: ("para\n> quote\nstill para",)
Expected Output: "<p>para</p><blockquote>quote</blockquote><p>still para</p>"
Explanation: Switching between non-quote and quote lines creates new blocks even without blank lines.
Input: ("Start ~~across\r\nlines~~ end\r\n\r\n> q1\r> q2",)
Expected Output: "<p>Start <del>across<br/>lines</del> end</p><blockquote>q1<br/>q2</blockquote>"
Explanation: Line endings are normalized first. The strikethrough spans two lines inside one paragraph block.
Input: (" \n\t\r\n",)
Expected Output: ""
Explanation: All lines are blank after trimming, so no blocks are produced.
Input: ("~~<x&y>~~ and ~~~~",)
Expected Output: "<p><del><x&y></del> and <del></del></p>"
Explanation: HTML-sensitive characters are escaped inside the deleted text, and adjacent `~~` pairs can produce an empty `<del></del>`.
Solution
def solution(text):
text = text.replace('\r\n', '\n').replace('\r', '\n')
def escape_text(s):
return s.replace('&', '&').replace('<', '<').replace('>', '>').replace('\n', '<br/>')
def render_inline(raw):
parts = raw.split('~~')
delimiter_count = len(parts) - 1
out = [escape_text(parts[0])]
index = 1
for _ in range(delimiter_count // 2):
out.append('<del>')
out.append(escape_text(parts[index]))
out.append('</del>')
index += 1
out.append(escape_text(parts[index]))
index += 1
if delimiter_count % 2 == 1:
out.append('~~')
out.append(escape_text(parts[index]))
return ''.join(out)
blocks = []
current_type = None
current_lines = []
def flush():
nonlocal current_type, current_lines
if current_type is None:
return
content = '\n'.join(current_lines)
html = render_inline(content)
if current_type == 'blockquote':
blocks.append('<blockquote>' + html + '</blockquote>')
else:
blocks.append('<p>' + html + '</p>')
current_type = None
current_lines = []
for line in text.split('\n'):
if line.strip() == '':
flush()
continue
line_type = 'blockquote' if line.startswith('> ') else 'paragraph'
line_content = line[2:] if line_type == 'blockquote' else line
if current_type is None:
current_type = line_type
current_lines = [line_content]
elif current_type == line_type:
current_lines.append(line_content)
else:
flush()
current_type = line_type
current_lines = [line_content]
flush()
return ''.join(blocks)
Time complexity: O(n). Space complexity: O(n).
Hints
- Scan the input line by line. Blank lines end the current block, and switching between quoted and non-quoted lines also starts a new block.
- Handle inline parsing one block at a time. A simple state machine for whether you are currently inside `~~...~~` avoids quadratic rescanning.