PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates algorithm design and complexity-analysis skills across search and graph problems, specifically log search for first-error detection, reachability and failure propagation in directed service dependency graphs, and longest-path reasoning under cycle or DAG assumptions.

  • Medium
  • Snowflake
  • Coding & Algorithms
  • Software Engineer

Design error detection and propagation algorithms

Company: Snowflake

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given three related sub-problems about error detection and propagation. A) Log search: You have a chronologically ordered array of log lines. Each line starts with one of the exact prefixes "[Info]", "[Warn]", or "[Error]". Guarantees: after the first "[Error]" appears, every subsequent line is an "[Error]"; immediately before the first "[Error]" there is at least one "[Warn]". Find the index of the first "[Error]" using as few reads as possible. Implement the function and state time and space complexities. Clarify what you return if no error exists. B) Failure fan-out: You are given a directed service dependency graph where an edge U -> V means "U calls V". When a service fails, all upstream callers that depend on it eventually fail. Given the ID of the first failing service and the graph (e.g., as an adjacency list of callers-to-callees), output the set of all services that will eventually fail (order not important). Describe and implement an algorithm and analyze its complexity. C) Longest failure chain (follow-up): Using the same graph and initial failing service, output one of the longest error-propagation chains starting from that service (i.e., an ordered list of services describing a longest path of cascading failures along the propagation direction). State any assumptions about cycles (e.g., treat the graph as a DAG or search for the longest simple path), justify your choice, and provide the algorithm with complexity analysis.

Quick Answer: This question evaluates algorithm design and complexity-analysis skills across search and graph problems, specifically log search for first-error detection, reachability and failure propagation in directed service dependency graphs, and longest-path reasoning under cycle or DAG assumptions.

Part 1: First Error in Monotonic Logs

You are given a chronologically ordered list of log lines. Every line starts with exactly one of these prefixes: '[Info]', '[Warn]', or '[Error]'. If an '[Error]' ever appears, then every line after the first error is also an '[Error]'. Also, if an error exists, the line immediately before the first '[Error]' is guaranteed to be a '[Warn]'. Return the index of the first '[Error]' while using as few log reads as possible. If no error exists, return -1.

Constraints

  • 0 <= len(logs) <= 10^6
  • Each log line starts with exactly one of: '[Info]', '[Warn]', '[Error]'
  • The array is in chronological order
  • If logs[i] starts with '[Error]', then all logs[j] for j > i also start with '[Error]'
  • If any error exists, its index is at least 1 and logs[first_error_index - 1] starts with '[Warn]'

Examples

Input: ["[Info] boot", "[Warn] retrying", "[Error] disk failed", "[Error] shutting down"]

Expected Output: 2

Explanation: The first error appears at index 2.

Input: ["[Info] start", "[Info] running", "[Warn] slow response"]

Expected Output: -1

Explanation: There is no '[Error]' line, so return -1.

Input: []

Expected Output: -1

Explanation: Empty input contains no errors.

Input: ["[Warn] almost full", "[Error] out of memory"]

Expected Output: 1

Explanation: The smallest valid case with an error has the first error at index 1.

Solution

def solution(logs):
    left = 0
    right = len(logs) - 1
    answer = -1

    while left <= right:
        mid = (left + right) // 2
        if logs[mid].startswith('[Error]'):
            answer = mid
            right = mid - 1
        else:
            left = mid + 1

    return answer

Time complexity: O(log n). Space complexity: O(1).

Hints

  1. Turn each line into a boolean question: 'Is this line an error?' What special pattern does that boolean array have?
  2. You are looking for the first position where a monotonic condition becomes true.

Part 2: Failure Fan-Out in a Service Graph

You are given a directed service dependency graph as an adjacency list. An edge U -> V means service U calls service V. If a service fails, then every caller that depends on it will eventually fail too, and that failure can continue propagating further upstream. Given the ID of the first failing service, return the set of all services that will eventually fail, including the initial service.

Constraints

  • 0 <= number of services <= 2 * 10^5
  • 0 <= number of edges <= 4 * 10^5
  • Service IDs are strings
  • The graph may be disconnected
  • The graph may contain cycles

Examples

Input: ({'A': ['B', 'C'], 'B': ['D'], 'E': ['B']}, 'D')

Expected Output: {'A', 'B', 'D', 'E'}

Explanation: D fails first, so B fails. Then both A and E, which depend on B, also fail.

Input: ({'Auth': ['DB'], 'API': ['Auth'], 'Worker': ['Queue']}, 'DB')

Expected Output: {'API', 'Auth', 'DB'}

Explanation: DB failure propagates to Auth and then to API. Worker is unaffected.

Input: ({'A': ['B'], 'B': ['C'], 'C': ['A']}, 'C')

Expected Output: {'A', 'B', 'C'}

Explanation: The graph contains a cycle, but visited tracking prevents infinite looping.

Input: ({}, 'X')

Expected Output: {'X'}

Explanation: If the graph is empty, only the initially failed service is known to fail.

Solution

def solution(graph, failed_service):
    from collections import defaultdict, deque

    reverse = defaultdict(list)
    for caller, callees in graph.items():
        for callee in callees:
            reverse[callee].append(caller)

    failed = {failed_service}
    queue = deque([failed_service])

    while queue:
        service = queue.popleft()
        for caller in reverse.get(service, []):
            if caller not in failed:
                failed.add(caller)
                queue.append(caller)

    return failed

Time complexity: O(V + E). Space complexity: O(V + E).

Hints

  1. Failure travels in the opposite direction of the given edges.
  2. If the input gives caller -> callee, what graph do you need to traverse to move from a failed service to all affected callers?

Part 3: Longest Failure Propagation Chain in a DAG

You are given the same service dependency graph, where U -> V means U calls V, and a failed starting service. Failure propagates upstream from callees to callers. Return one of the longest possible failure-propagation chains starting from the initial failed service, as an ordered list of service IDs. For this problem, assume the dependency graph is a DAG. This assumption is necessary because finding the longest simple path in a general directed graph is NP-hard. If multiple longest chains exist, returning any one of them is acceptable.

Constraints

  • 0 <= number of services <= 2 * 10^5
  • 0 <= number of edges <= 4 * 10^5
  • Service IDs are strings
  • The dependency graph is a DAG
  • If failed_service has no upstream callers, the answer is [failed_service]

Examples

Input: ({'A': ['B'], 'B': ['C'], 'D': ['C'], 'E': ['D'], 'F': ['E']}, 'C')

Expected Output: ['C', 'D', 'E', 'F']

Explanation: From C, one chain is C -> B -> A, but the longer one is C -> D -> E -> F.

Input: ({'API': ['Auth', 'Cache'], 'Web': ['API'], 'Mobile': ['API'], 'Frontend': ['Web'], 'Auth': ['DB']}, 'DB')

Expected Output: ['DB', 'Auth', 'API', 'Web', 'Frontend']

Explanation: The longest upstream failure chain goes from DB to Auth to API to Web to Frontend.

Input: ({'A': ['B']}, 'A')

Expected Output: ['A']

Explanation: A has no upstream callers in the propagation direction, so the chain contains only A.

Input: ({}, 'X')

Expected Output: ['X']

Explanation: With no graph edges, the longest chain is just the starting service.

Solution

def solution(graph, failed_service):
    from collections import defaultdict, deque

    reverse = defaultdict(list)
    for caller, callees in graph.items():
        for callee in callees:
            reverse[callee].append(caller)

    reachable = {failed_service}
    stack = [failed_service]
    while stack:
        service = stack.pop()
        for caller in reverse.get(service, []):
            if caller not in reachable:
                reachable.add(caller)
                stack.append(caller)

    indegree = {service: 0 for service in reachable}
    for service in reachable:
        for caller in reverse.get(service, []):
            if caller in reachable:
                indegree[caller] += 1

    dist = {service: float('-inf') for service in reachable}
    parent = {failed_service: None}
    dist[failed_service] = 0
    end = failed_service

    queue = deque([failed_service])
    while queue:
        service = queue.popleft()
        for caller in reverse.get(service, []):
            if caller in reachable and dist[service] + 1 > dist[caller]:
                dist[caller] = dist[service] + 1
                parent[caller] = service
                if dist[caller] > dist[end]:
                    end = caller
            if caller in reachable:
                indegree[caller] -= 1
                if indegree[caller] == 0:
                    queue.append(caller)

    path = []
    current = end
    while current is not None:
        path.append(current)
        current = parent.get(current)

    return path[::-1]

Time complexity: O(V + E). Space complexity: O(V + E).

Hints

  1. Reverse the edges so they point in the same direction as failure propagation.
  2. In a DAG, longest paths can be computed with dynamic programming over a topological order.
Last updated: Apr 30, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Array Distance and Wiki Navigation - Snowflake (medium)
  • Implement Document Predicate APIs - Snowflake (medium)
  • Find Shortest Wiki Click Path - Snowflake (medium)
  • Schedule prerequisite classes with retakes - Snowflake (easy)
  • Solve three coding rounds - Snowflake (medium)