PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates parsing and string-processing skills, including tokenization, stateful scanning, inline versus block-level markup handling, edge-case management (e.g., unmatched delimiters, trailing spaces, empty paragraphs) and reasoning about time and space complexity.

  • Medium
  • Samsara
  • Coding & Algorithms
  • Software Engineer

Implement Markdown-to-HTML parser

Company: Samsara

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Implement a function that converts a plain-text string into HTML supporting only these features: ( 1) Paragraphs: two or more consecutive newline characters separate paragraphs; wrap each paragraph in <p>...</p>. ( 2) Soft line breaks: a single newline inside a paragraph becomes <br/>. ( 3) Blockquotes: one or more consecutive lines that each begin with "> " (greater-than followed by a space) form a single <blockquote>...</blockquote>; within the blockquote, strip the leading "> " from each quoted line; preserve soft line breaks as <br/>; blockquotes cannot span paragraphs. ( 4) Strikethrough: text enclosed by a pair of tildes, e.g., ~~like this~~, becomes <del>like this</del>. ( 5) Formatting commands do not cross paragraphs; if a strikethrough is opened in one paragraph, it must close in the same paragraph or be treated as literal text. The output must be valid HTML; exact whitespace and self-closing syntax need not match any reference output. Provide the algorithm, discuss how you will tokenize/scan the input (single pass vs. multi-pass), specify time and space complexity in terms of input length n, and describe how you will handle edge cases such as trailing spaces, empty paragraphs, consecutive blockquote groups, malformed or unmatched tildes, and lines that mix quoted and non-quoted text. Include a few test cases, for example: "This is a paragraph with a soft line break. > Some quoted text > continues here This has a ~~strikethrough~~ word."

Quick Answer: This question evaluates parsing and string-processing skills, including tokenization, stateful scanning, inline versus block-level markup handling, edge-case management (e.g., unmatched delimiters, trailing spaces, empty paragraphs) and reasoning about time and space complexity.

Implement `solution(text)` that converts a plain-text string into HTML using only a limited Markdown subset. Normalize line endings first. A blank line is any line whose trimmed contents are empty, and blank lines end the current block. A maximal consecutive run of lines starting with `> ` becomes one `<blockquote>...</blockquote>` block; remove the leading `> ` from each quoted line. A maximal consecutive run of other non-blank lines becomes one `<p>...</p>` block. Inside any block, line breaks between its lines become `<br/>`. Inline strikethrough uses pairs of `~~` inside a single block; pair markers from left to right, convert the enclosed text to `<del>...</del>`, and if the last opening `~~` in a block has no closing `~~` in that same block, keep it as literal text. Escape `&`, `<`, and `>` in normal text so the output is valid HTML. A line beginning with `>` but not the exact prefix `> ` is normal paragraph text. Return all rendered blocks concatenated in order with no extra separators.

Constraints

  • 0 <= len(text) <= 200000
  • The parser only needs to support paragraphs, soft line breaks, blockquotes, and strikethrough as described.
  • A blockquote marker is recognized only when a line starts with the exact prefix `> `.
  • Strikethrough markers `~~` are paired left to right within the same block only; unmatched final `~~` is treated as literal text.

Examples

Input: ("This is a paragraph with a\nsoft line break.\n\n> Some quoted text\n> continues here\n\nThis has a ~~strikethrough~~ word.",)

Expected Output: "<p>This is a paragraph with a<br/>soft line break.</p><blockquote>Some quoted text<br/>continues here</blockquote><p>This has a <del>strikethrough</del> word.</p>"

Explanation: Two paragraph lines become one `<p>` with `<br/>`, the quoted lines become one `<blockquote>`, and the paired `~~` becomes `<del>`.

Input: ("5 < 6 & 7 > 3\n>not a quote\n~~open only",)

Expected Output: "<p>5 &lt; 6 &amp; 7 &gt; 3<br/>&gt;not a quote<br/>~~open only</p>"

Explanation: The second line is not a blockquote because it starts with `>` but not `> `. The final unmatched `~~` is kept literally.

Input: ("para\n> quote\nstill para",)

Expected Output: "<p>para</p><blockquote>quote</blockquote><p>still para</p>"

Explanation: Switching between non-quote and quote lines creates new blocks even without blank lines.

Input: ("Start ~~across\r\nlines~~ end\r\n\r\n> q1\r> q2",)

Expected Output: "<p>Start <del>across<br/>lines</del> end</p><blockquote>q1<br/>q2</blockquote>"

Explanation: Line endings are normalized first. The strikethrough spans two lines inside one paragraph block.

Input: (" \n\t\r\n",)

Expected Output: ""

Explanation: All lines are blank after trimming, so no blocks are produced.

Input: ("~~<x&y>~~ and ~~~~",)

Expected Output: "<p><del>&lt;x&amp;y&gt;</del> and <del></del></p>"

Explanation: HTML-sensitive characters are escaped inside the deleted text, and adjacent `~~` pairs can produce an empty `<del></del>`.

Solution

def solution(text):
    text = text.replace('\r\n', '\n').replace('\r', '\n')

    def escape_text(s):
        return s.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('\n', '<br/>')

    def render_inline(raw):
        parts = raw.split('~~')
        delimiter_count = len(parts) - 1
        out = [escape_text(parts[0])]
        index = 1

        for _ in range(delimiter_count // 2):
            out.append('<del>')
            out.append(escape_text(parts[index]))
            out.append('</del>')
            index += 1
            out.append(escape_text(parts[index]))
            index += 1

        if delimiter_count % 2 == 1:
            out.append('~~')
            out.append(escape_text(parts[index]))

        return ''.join(out)

    blocks = []
    current_type = None
    current_lines = []

    def flush():
        nonlocal current_type, current_lines
        if current_type is None:
            return

        content = '\n'.join(current_lines)
        html = render_inline(content)

        if current_type == 'blockquote':
            blocks.append('<blockquote>' + html + '</blockquote>')
        else:
            blocks.append('<p>' + html + '</p>')

        current_type = None
        current_lines = []

    for line in text.split('\n'):
        if line.strip() == '':
            flush()
            continue

        line_type = 'blockquote' if line.startswith('> ') else 'paragraph'
        line_content = line[2:] if line_type == 'blockquote' else line

        if current_type is None:
            current_type = line_type
            current_lines = [line_content]
        elif current_type == line_type:
            current_lines.append(line_content)
        else:
            flush()
            current_type = line_type
            current_lines = [line_content]

    flush()
    return ''.join(blocks)

Time complexity: O(n). Space complexity: O(n).

Hints

  1. Scan the input line by line. Blank lines end the current block, and switching between quoted and non-quoted lines also starts a new block.
  2. Handle inline parsing one block at a time. A simple state machine for whether you are currently inside `~~...~~` avoids quadratic rescanning.
Last updated: May 5, 2026

Related Coding Questions

  • Parse Markdown Links Without Regex - Samsara (Medium)

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.