Highlight Exact Source Matches
Company: Harvey
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates string-processing and algorithmic competencies, specifically exact substring matching, interval merging for overlapping or adjacent matches, and robust text transformation to produce correctly tagged output.
Part 1: Wrap Exact Source Matches in <yellow> Tags
Constraints
- 0 <= len(text) <= 2000
- 0 <= len(sources) <= 100
- 0 <= len(source) <= 50 for each source in sources
- Matching is exact substring matching and is case-sensitive
- Overlapping or adjacent matches must be merged into one highlighted region
Examples
Input: ("abcxyz123", ["abc", "xyz", "z12"])
Expected Output: "<yellow>abcxyz12</yellow>3"
Explanation: Matches are [0,3), [3,6), and [5,8). They merge into [0,8).
Input: ("foobarbaz", ["foo", "baz"])
Expected Output: "<yellow>foo</yellow>bar<yellow>baz</yellow>"
Explanation: There are two separate highlighted regions.
Input: ("aaab", ["aa", "aab"])
Expected Output: "<yellow>aaab</yellow>"
Explanation: Matches overlap and extend to cover the whole string.
Input: ("", ["abc"])
Expected Output: ""
Explanation: Empty text produces an empty result.
Input: ("hello", [])
Expected Output: "hello"
Explanation: With no sources, nothing is highlighted.
Hints
- Think about marking which character positions belong to at least one match.
- While scanning left to right, keep track of the farthest match end seen so far. This naturally merges overlaps and adjacency.
Part 2: Count Exact Occurrences of Each Source
Constraints
- 0 <= len(text) <= 2000
- 0 <= len(sources) <= 100
- 0 <= len(source) <= 50 for each source in sources
- Matching is exact substring matching and is case-sensitive
- Overlapping matches count as separate occurrences
Examples
Input: ("abcxyz123", ["abc", "xyz", "z12", "no"])
Expected Output: [1, 1, 1, 0]
Explanation: The first three sources appear once each, and no does not appear.
Input: ("aaaaa", ["aa", "aaa", "b"])
Expected Output: [4, 3, 0]
Explanation: Overlapping matches count: aa appears at 0,1,2,3 and aaa appears at 0,1,2.
Input: ("aAaA", ["aA", "Aa", "aa"])
Expected Output: [2, 1, 0]
Explanation: Matching is case-sensitive.
Input: ("", ["a", "aa"])
Expected Output: [0, 0]
Explanation: An empty text contains no non-empty substrings.
Input: ("hello", [])
Expected Output: []
Explanation: No sources means no counts to return.
Hints
- For each source, check every possible starting index in text.
- Do not skip forward by the full source length after a match if overlaps should count.
Part 3: Contributors for Each Merged Highlight Region
Constraints
- 0 <= len(text) <= 2000
- 0 <= len(sources) <= 100
- 0 <= len(source) <= 50 for each source in sources
- Matching is exact substring matching and is case-sensitive
- Overlapping or adjacent intervals must be merged
- The contributors list for a region should contain each distinct source string at most once
Examples
Input: ("abcxyz123", ["abc", "xyz", "z12"])
Expected Output: [(0, 8, ["abc", "xyz", "z12"])]
Explanation: All three matches connect through overlap or adjacency, so they form one merged region.
Input: ("aaXXbbYYaa", ["aa", "bb", "YY"])
Expected Output: [(0, 2, ["aa"]), (4, 10, ["aa", "bb", "YY"])]
Explanation: The first aa is separate. The later bb, YY, and aa are adjacent and merge into one region. Contributors follow the input source order.
Input: ("mississippi", ["iss", "ssi", "ppi"])
Expected Output: [(1, 11, ["iss", "ssi", "ppi"])]
Explanation: A chain of overlaps and adjacency merges everything from index 1 through 10.
Input: ("abcabc", ["abc", "abc", "bc"])
Expected Output: [(0, 6, ["abc", "bc"])]
Explanation: Duplicate source strings count as one contributor label in the final region.
Input: ("hello", ["x", "y"])
Expected Output: []
Explanation: No matches means there are no merged highlight regions.
Hints
- Store the source string together with every interval you find.
- After sorting intervals by start index, merge them while maintaining a set of which sources contributed to the current region.