PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates string parsing, stateful text-processing, and algorithmic chunking skills, focusing on hierarchical header tracking and enforcing size constraints while preserving document order.

  • hard
  • Sierra
  • Coding & Algorithms
  • Software Engineer

Split Markdown Into Header-Aware Chunks

Company: Sierra

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

Implement a function that splits a Markdown document into ordered chunks, where each chunk must not exceed a given size limit. A Markdown header is a line that starts with one to six `#` characters followed by a space, such as `# Title` or `## Section`. Non-header lines belong to the current active header hierarchy. When a new chunk starts in the middle of a section, the chunk must begin by repeating all active parent headers for the first content line in that chunk. For example, if a paragraph belongs under `# Header 1` and `## Header 2`, and a chunk boundary occurs immediately before that paragraph, the new chunk should start with: ```markdown # Header 1 ## Header 2 ``` followed by the paragraph. Write a function such as: ```python def split_markdown(markdown: str, max_size: int) -> list[str]: pass ``` Requirements: - Preserve the original order of the document content. - Each returned chunk must have length at most `max_size`. - Header context added to a chunk counts toward the size limit. - Header hierarchy should be updated correctly when encountering headers of different levels. - You may assume that any single indivisible line plus its required header context can fit within `max_size`. - Define and document any reasonable assumptions about whitespace and newline handling.

Quick Answer: This question evaluates string parsing, stateful text-processing, and algorithmic chunking skills, focusing on hierarchical header tracking and enforcing size constraints while preserving document order.

Implement `solution(markdown, max_size)` to split a Markdown document into ordered chunks. Treat the input as lines produced by `markdown.split('\n')`; blank lines are preserved, and an empty input string should return `[]`. A Markdown header is a line that starts at column 0 with 1 to 6 `#` characters followed by a space. When a header of level `L` is encountered, it becomes the active header for level `L`, and any active headers at deeper levels are cleared. Non-header lines belong to the current active header hierarchy. Build the answer greedily: start each chunk at the next unread original line, prepend all active parent headers for that first original line, then append as many consecutive original lines as possible without making the returned chunk longer than `max_size`. If the first original line of a chunk is itself a header, repeat only its parent headers, not the header line twice. Chunk size is measured by the final string length, including inserted context and every `\n` separator between lines. Return each chunk as a string joined with `\n`, with no extra trailing newline added.

Constraints

  • 0 <= len(markdown) <= 200000
  • 1 <= max_size <= 200000
  • Only lines that begin with 1 to 6 `#` characters followed by a space are headers
  • Any single original line together with its required repeated parent-header context fits within `max_size`

Examples

Input: ("", 10)

Expected Output: []

Explanation: An empty document has no lines, so there are no chunks.

Input: ("# Title\nHello\n## Section\nWorld", 100)

Expected Output: ["# Title\nHello\n## Section\nWorld"]

Explanation: The whole document fits in one chunk, so nothing needs to be repeated.

Input: ("# H1\nIntro line\n## H2\nParagraph one\nParagraph two", 24)

Expected Output: ["# H1\nIntro line\n## H2", "# H1\n## H2\nParagraph one", "# H1\n## H2\nParagraph two"]

Explanation: Greedy chunking makes the first chunk stop after `## H2`. The next chunks start inside that subsection, so both active headers are repeated.

Input: ("# A\n## B\nxx\n### C\ny\n## D\nz", 16)

Expected Output: ["# A\n## B\nxx", "# A\n## B\n### C\ny", "# A\n## D\nz"]

Explanation: When `## D` appears, it replaces the earlier level-2 header and clears the deeper `### C`, so only `# A` is repeated before `## D`.

Input: ("# T\n\nA\nB", 6)

Expected Output: ["# T\n\nA", "# T\nB"]

Explanation: Blank lines are preserved as ordinary lines. The second chunk repeats `# T` because `B` is still inside that section.

Input: ("### Deep\nx\n## Mid\ny", 10)

Expected Output: ["### Deep\nx", "## Mid\ny"]

Explanation: A shallower level-2 header clears the previously active level-3 header, so `## Mid` starts a new hierarchy with no repeated parent headers.

Hints

  1. Maintain the currently active headers by level. When you see a level-`L` header, clear levels `L` through `6`, then store the new header at level `L`.
  2. For each potential chunk start, determine the header prefix that must be repeated, then greedily append consecutive lines while tracking the exact joined-string length, including `\n` separators.
Last updated: May 19, 2026

Related Coding Questions

  • Process time intervals and detect overlaps/gaps - Sierra (medium)

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.