PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of sequence reconstruction, graph modeling, and traversal/orientation constraints, focusing on ordering labeled fragments into a single non-repeating chain.

  • nan
  • Microsoft
  • Coding & Algorithms
  • Software Engineer

Reconstruct DNA from tagged fragments

Company: Microsoft

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: nan

Interview Round: Technical Screen

You are given a list of DNA fragments. Each fragment is represented by a `Sequence` object containing two endpoint labels and a DNA string `payload`. Your task is to implement: ```text String shotgunSequence(List<Sequence> sequences) ``` The function must order the fragments into a single chain that uses **every fragment exactly once**, then return the reconstructed DNA string by concatenating the `payload` strings in that chain order. --- ## Part 1 (Directed chaining) Each fragment has: - `startId` (string) - `endId` (string) - `payload` (string) A fragment `i` can be followed by fragment `j` iff: - `i.endId == j.startId` ### Guarantees - All labels (`startId`/`endId`) are unique identifiers where applicable. - There exists **exactly one** valid ordering that uses **all** fragments. - The valid ordering forms one continuous chain. ### Example Fragments: - (`"AAA"`, `"AAC"`, `"AAAA"`) - (`"AGG"`, `"ACC"`, `"GGGG"`) - (`"AAC"`, `"ACT"`, `"TTTT"`) - (`"ACT"`, `"AGG"`, `"CCCC"`) Valid label chain: `AAA → AAC → ACT → AGG → ACC` Output: `"AAAATTTTCCCCGGGG"` --- ## Follow-up (Unknown direction / two tags) Now each fragment has: - `tag1` (string) - `tag2` (string) - `payload` (string) Fragment orientation is unknown: a fragment can be traversed as either `tag1 → tag2` or `tag2 → tag1`. Two consecutive fragments can be connected if they **share a tag** (i.e., the end tag of the current traversal equals a tag on the next fragment, choosing its traversal direction appropriately). ### Guarantees - At least one valid ordering exists that uses **all** fragments exactly once. - You must still use **all** fragments. ### Example Fragments: - (`A`, `B`, `"AAAA"`) - (`B`, `C`, `"TTTT"`) - (`C`, `D`, `"CCCC"`) - (`D`, `B`, `"GGGG"`) One valid tag walk: `A → B → C → D → B` Output: `"AAAATTTTCCCCGGGG"` --- ### What to return Return the concatenation of payloads in the chosen fragment order. (No extra separators; do not merge/overlap payload strings.)

Quick Answer: This question evaluates understanding of sequence reconstruction, graph modeling, and traversal/orientation constraints, focusing on ordering labeled fragments into a single non-repeating chain.

Part 1: Directed DNA Fragment Chaining

You are given an unsorted list of DNA fragments. Each fragment is a tuple `(start_id, end_id, payload)`. A fragment can be followed by another fragment only if the first fragment's `end_id` equals the second fragment's `start_id`. The fragments are guaranteed to form exactly one continuous directed chain that uses every fragment exactly once. Reconstruct the DNA by ordering the fragments correctly and concatenating their `payload` strings in chain order. If the input list is empty, return an empty string.

Constraints

  • `0 <= len(fragments) <= 200000`
  • Each `start_id` appears in at most one fragment.
  • The fragments form exactly one valid directed chain using all fragments.
  • The total length of all payload strings fits in memory.

Examples

Input: [('AAA', 'AAC', 'AAAA'), ('AGG', 'ACC', 'GGGG'), ('AAC', 'ACT', 'TTTT'), ('ACT', 'AGG', 'CCCC')]

Expected Output: "AAAATTTTCCCCGGGG"

Explanation: The unique chain is `AAA -> AAC -> ACT -> AGG -> ACC`, so the payloads are joined as `AAAA + TTTT + CCCC + GGGG`.

Input: [('X', 'Y', 'ACGT')]

Expected Output: "ACGT"

Explanation: A single fragment is already the full chain.

Input: []

Expected Output: ""

Explanation: No fragments means the reconstructed DNA is an empty string.

Input: [('s2', 's3', 'CC'), ('s0', 's1', 'AA'), ('s1', 's2', 'BB'), ('s3', 's4', 'DD')]

Expected Output: "AABBCCDD"

Explanation: The start label is `s0`, giving the chain `s0 -> s1 -> s2 -> s3 -> s4`.

Hints

  1. Think of the fragments like links in a singly linked list.
  2. The first fragment starts at the label that never appears as any fragment's `end_id`.

Part 2: Reversible DNA Fragment Chaining

You are given an unsorted list of DNA fragments. Each fragment is a tuple `(tag1, tag2, payload)`. A fragment may be traversed in either direction, so two consecutive fragments are compatible when the current traversal ends at a tag that the next fragment also has. Your task is to order the fragments so that every fragment is used exactly once, then return the concatenation of payloads in that order. For deterministic grading, if multiple full orderings exist, use this rule: 1. Build an Euler trail on the tags. 2. If there are odd-degree tags, start from the lexicographically smallest odd-degree tag. 3. Otherwise, start from the lexicographically smallest tag that appears in any fragment. 4. When exploring unused fragments from a tag, prefer smaller original input indices first. If the input list is empty, return an empty string.

Constraints

  • `0 <= len(fragments) <= 200000`
  • At least one ordering exists that uses every fragment exactly once.
  • Tags are strings; duplicate tag pairs and self-loops are allowed.
  • The total length of all payload strings fits in memory.

Examples

Input: [('A', 'B', 'AAAA'), ('B', 'C', 'TTTT'), ('C', 'D', 'CCCC'), ('D', 'B', 'GGGG')]

Expected Output: "AAAATTTTCCCCGGGG"

Explanation: The odd-degree tags are `A` and `B`, so start from `A`. The resulting trail uses edges in input-index order: 0, 1, 2, 3.

Input: []

Expected Output: ""

Explanation: No fragments means the reconstructed DNA is an empty string.

Input: [('X', 'Y', 'ACGT')]

Expected Output: "ACGT"

Explanation: A single fragment is already a valid trail.

Input: [('B', 'A', 'AA'), ('A', 'C', 'CC'), ('C', 'B', 'GG')]

Expected Output: "AAGGCC"

Explanation: All tags have even degree, so start from the lexicographically smallest tag `A`. With input-index preference, the Euler circuit yields edge order 0, 2, 1.

Input: [('A', 'A', 'X'), ('A', 'B', 'Y'), ('B', 'A', 'Z')]

Expected Output: "XYZ"

Explanation: This case includes a self-loop. Starting from `A`, the deterministic Euler trail uses edges in the order 0, 1, 2.

Hints

  1. Model each fragment as an edge in an undirected multigraph whose vertices are tags.
  2. Using every fragment exactly once is an Euler trail/circuit problem; Hierholzer's algorithm works well here.
Last updated: Jun 7, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Return Top K Open Businesses - Microsoft (hard)
  • Implement Memory Allocation and In-Memory Records - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Sort Three Categories In Place - Microsoft (medium)
  • Retain Top K Elements - Microsoft (medium)