Filter hierarchical paths after deletions
Company: Snowflake
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in hierarchical string processing, prefix-based filtering, edge-case handling (duplicates, trailing slashes, empty inputs), data-structure selection, and algorithmic time/space complexity analysis within the Coding & Algorithms domain, emphasizing practical implementation over purely conceptual reasoning.
Constraints
- 0 <= len(paths), len(deletes) <= 10^5
- Each path/delete string length up to ~200 characters
- Segments are separated by '/'; leading, trailing, and duplicate slashes may appear and must be ignored when comparing
- An empty delete entry (e.g. "" or "/") matches nothing and is skipped
- Comparison is by full path segments, not raw substring prefixes
Examples
Input: (["California", "California/San Francisco", "California/San Francisco/7th Street"], ["California/San Francisco"])
Expected Output: ["California"]
Explanation: "California/San Francisco" equals the delete; "California/San Francisco/7th Street" is its descendant; "California" is an ancestor and survives.
Input: (["California", "California/San Francisco", "Nevada", "Nevada/Reno"], ["California", "Nevada/Reno"])
Expected Output: ["Nevada"]
Explanation: Deleting "California" removes it and all descendants; deleting "Nevada/Reno" removes only that leaf, leaving "Nevada".
Input: (["A", "A/B", "A/B", "A/C"], ["A/B"])
Expected Output: ["A", "A/C"]
Explanation: Both duplicate "A/B" entries are removed; "A" and "A/C" are unaffected ("A/C" is not under "A/B").
Input: (["California/", "California/San Francisco/", "California/San Francisco/7th Street"], ["California/San Francisco"])
Expected Output: ["California/"]
Explanation: Trailing slashes are ignored for comparison, so "California/San Francisco/" matches the delete, but the surviving "California/" is returned with its original trailing slash intact.
Input: ([], ["X"])
Expected Output: []
Explanation: Empty paths array yields an empty result regardless of deletes.
Input: (["A", "A/B"], [])
Expected Output: ["A", "A/B"]
Explanation: Empty deletes array removes nothing; all paths survive.
Input: (["A/B/C", "A/B", "A", "AB", "AB/C"], ["A/B"])
Expected Output: ["A", "AB", "AB/C"]
Explanation: Boundary case: "AB" and "AB/C" are NOT descendants of "A/B" (segment-prefix, not substring-prefix), so only "A/B" and "A/B/C" are removed.
Hints
- Normalize each path by splitting on '/' and dropping empty segments, so "California/San Francisco/" and "California//San Francisco" both become ["California", "San Francisco"]. This handles trailing/duplicate slashes.
- Put the normalized delete paths into a hash set (as tuples of segments). A path should be removed if ANY of its segment-prefixes (including the full path) is in that set.
- Why prefixes? Path P is a descendant of delete D exactly when D's segments are a prefix of P's segments. Checking every prefix of P against the set covers both 'equal to' and 'descendant of'.
- Keep the ORIGINAL string for output even though you compare on the normalized form — the test expects "California/" back, not "California".
- A Trie of segments is an alternative: insert each delete path, mark its terminal node as deleted, then walk each path and drop it if you pass through any deleted terminal node.