Validate course catalog dependencies
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of graph algorithms and data validation, specifically dependency analysis, cycle detection, and topological ordering.
Constraints
- 0 <= number of courses <= 10^5
- 0 <= number of prerequisite pairs <= 2 * 10^5
- Course IDs are distinct strings within course_ids.
- A prerequisite pair (u, v) may reference an ID not present in course_ids; such IDs must be reported in `missing`.
- A pair (u, v) means u must be completed before v (directed edge u -> v).
Examples
Input: (['A', 'B', 'C'], [('A', 'B'), ('B', 'C')])
Expected Output: (True, ['A', 'B', 'C'], [], [])
Explanation: Linear chain A->B->C is acyclic with all references present, so it is valid and the unique topological order is [A, B, C].
Input: (['A', 'B', 'C'], [('A', 'B'), ('B', 'C'), ('C', 'A')])
Expected Output: (False, [], [], ['A', 'B', 'C', 'A'])
Explanation: A->B->C->A forms a cycle, so the catalog is invalid, no ordering is returned, and one detected cycle is A->B->C->A.
Input: (['A', 'B'], [('A', 'B'), ('A', 'X')])
Expected Output: (False, ['A', 'B'], ['X'], [])
Explanation: X is referenced as requiring A but is not in the catalog, so it is a missing reference. The existing graph (A->B) is acyclic, so an ordering of existing courses is still produced, but the catalog is invalid.
Input: ([], [])
Expected Output: (True, [], [], [])
Explanation: An empty catalog with no prerequisites is trivially valid with an empty ordering.
Input: (['A'], [])
Expected Output: (True, ['A'], [], [])
Explanation: A single course with no prerequisites is valid; its ordering is just [A].
Input: (['A', 'B', 'C', 'D'], [('A', 'B'), ('A', 'C'), ('B', 'D'), ('C', 'D')])
Expected Output: (True, ['A', 'B', 'C', 'D'], [], [])
Explanation: A diamond DAG (A before B and C, both before D). With lexicographic tie-breaking the order is [A, B, C, D].
Input: (['A', 'B'], [('A', 'B'), ('B', 'A')])
Expected Output: (False, [], [], ['A', 'B', 'A'])
Explanation: A and B require each other, forming a 2-node cycle. The catalog is invalid and one detected cycle is A->B->A.
Hints
- Separate the two failure modes: a missing reference (an ID in some pair that is not in the catalog) and a cycle (a circular dependency). Scan all pairs once to collect missing IDs before touching the graph.
- Build the dependency graph only over courses that actually exist, with edge u -> v for each pair (u, v). Track in-degree per node.
- Use Kahn's algorithm (BFS on in-degree-zero nodes). If the produced ordering contains every catalog course, the graph is acyclic; if it is shorter, a cycle exists among the nodes that never reached in-degree zero.
- To report one concrete cycle, run a DFS restricted to the nodes that still have positive in-degree after Kahn's sort; the first gray (on-stack) node you revisit closes a cycle — slice it out of the current DFS path.
- For deterministic output, break ties in lexicographic order whenever multiple nodes are simultaneously ready.