PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's skill in log-driven debugging, production incident investigation, and cross-functional task allocation, covering competencies such as structured and contextual logging, correlation identifiers, noise reduction, minimal reproducibility, feature-flagged isolation, triage coordination, hypothesis tracking, root-cause confirmation, regression prevention, and blameless postmortems. It is commonly asked to assess operational troubleshooting and teamwork under production uncertainty within the Coding & Algorithms category, testing practical application of debugging techniques alongside conceptual understanding of incident management and process-level coordination.

  • Medium
  • DoorDash
  • Coding & Algorithms
  • Software Engineer

Debug using logs and allocate tasks

Company: DoorDash

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

You inherit a service where unit tests pass locally, but production logs show intermittent errors. Without a debugger, outline a step-by-step debugging plan that relies primarily on logs. Include: setting log levels, adding structured and contextual logging (correlation IDs), identifying signal from noisy logs, building a minimal reproducible case, and using feature flags or canaries to isolate changes. Explain how you would divide and assign investigation tasks across the team, establish a triage board, and track hypotheses to resolution. Walk through how you would confirm root cause, implement the fix, add regression tests/alerts, and run a blameless postmortem.

Quick Answer: This question evaluates a candidate's skill in log-driven debugging, production incident investigation, and cross-functional task allocation, covering competencies such as structured and contextual logging, correlation identifiers, noise reduction, minimal reproducibility, feature-flagged isolation, triage coordination, hypothesis tracking, root-cause confirmation, regression prevention, and blameless postmortems. It is commonly asked to assess operational troubleshooting and teamwork under production uncertainty within the Coding & Algorithms category, testing practical application of debugging techniques alongside conceptual understanding of incident management and process-level coordination.

You are helping a team debug intermittent production errors using logs. Each log line is already structured and includes a correlation ID, service, feature flag/canary name, severity level, and message. Your task is to build a first-pass triage tool that filters noisy logs, groups events by correlation ID, identifies high-signal failing traces, creates investigation tasks by the first failing service/feature, and assigns those tasks to engineers. A log record is a list of 6 strings: [timestamp, correlation_id, level, service, feature, message]. The timestamp is a decimal integer stored as a string. Use these severity values: DEBUG < INFO < WARN < ERROR < FATAL. Triage rules: 1. Ignore logs whose severity is below min_level. 2. Ignore logs whose message contains any of these noisy phrases, case-insensitively: "heartbeat", "healthcheck", or "retry succeeded". 3. Ignore logs with an empty correlation_id. 4. A problem trace is a correlation_id that has at least one remaining ERROR or FATAL log. 5. For each problem trace, find its earliest remaining ERROR/FATAL log by timestamp. If timestamps tie, the log that appeared earlier in the input wins. The investigation task for that trace is service:feature from that earliest error log. 6. For each task, compute: - trace_count: number of problem traces assigned to that task - error_count: total number of remaining ERROR/FATAL logs across those traces - top_correlation_id: the trace assigned to that task with the most ERROR/FATAL logs. Ties are broken by earlier first error timestamp, then lexicographically smaller correlation_id. 7. Sort tasks by descending error_count, then descending trace_count, then lexicographically by task key. 8. Assign tasks in sorted order. An engineer record is [name, skill_service]. An engineer can work on a task if skill_service equals the task service, or skill_service is "*". Choose the eligible engineer with the fewest assigned tasks so far. Ties are broken by preferring an exact service match over "*", then lexicographically smaller engineer name. If no engineer is eligible, assign "UNASSIGNED". Return one record per task as [task_key, engineer_name, trace_count, error_count, top_correlation_id]. The count fields should be returned as strings.

Constraints

  • 0 <= len(logs) <= 200000
  • Each log record has exactly 6 strings: [timestamp, correlation_id, level, service, feature, message]
  • timestamp is a non-negative integer represented as a decimal string
  • level and min_level are one of: DEBUG, INFO, WARN, ERROR, FATAL
  • 0 <= len(engineers) <= 100
  • Engineer names are unique
  • service, feature, correlation_id, engineer name, and skill_service contain no colon characters except skill_service may be "*"

Examples

Input: ([['1','c1','INFO','auth','login','start'],['2','c1','ERROR','auth','login','db timeout'],['3','c2','WARN','payment','checkout','slow'],['4','c2','ERROR','payment','checkout','card declined'],['5','c2','ERROR','payment','checkout','retry failed'],['6','c3','INFO','auth','login','heartbeat']], [['Ana','auth'],['Ben','payment'],['Mia','*']], 'INFO')

Expected Output: [['payment:checkout','Ben','1','2','c2'],['auth:login','Ana','1','1','c1']]

Explanation: c2 has two error logs, so payment:checkout is highest priority and is assigned to payment specialist Ben. c1 creates auth:login and is assigned to Ana. The heartbeat log is ignored as noise.

Input: ([['10','a','DEBUG','api','v2','payload'],['11','a','ERROR','api','v2','null pointer'],['12','a','ERROR','db','query','cascade'],['13','b','ERROR','api','v2','timeout'],['14','c','FATAL','worker','sync','disk full'],['15','c','ERROR','worker','sync','healthcheck failed'],['16','d','ERROR','api','v2','retry succeeded after fail']], [['Ann','api'],['Wes','worker'],['Gus','*']], 'WARN')

Expected Output: [['api:v2','Ann','2','3','a'],['worker:sync','Wes','1','1','c']]

Explanation: Trace a's first error is api:v2 and it has two error logs total. Trace b also maps to api:v2. The healthcheck and retry-succeeded messages are ignored. api:v2 has 3 total errors, so it is first.

Input: ([['1','t1','ERROR','auth','login','x'],['2','t2','ERROR','auth','signup','x'],['3','t3','ERROR','auth','logout','x']], [['Ana','auth'],['Mia','*']], 'INFO')

Expected Output: [['auth:login','Ana','1','1','t1'],['auth:logout','Mia','1','1','t3'],['auth:signup','Ana','1','1','t2']]

Explanation: All tasks have equal priority, so they sort lexicographically by task key. Ana gets the first task due to exact-match tie-break, Mia gets the second because she has lower load, and Ana gets the third due to exact-match tie-break at equal load.

Input: ([['1','x','INFO','api','v1','start'],['2','x','ERROR','api','v1','heartbeat timeout'],['3','','ERROR','api','v1','missing correlation id']], [['A','*']], 'INFO')

Expected Output: []

Explanation: The only error with a correlation ID is ignored as heartbeat noise, and the other error has an empty correlation ID, so there are no problem traces.

Input: ([['1','p','FATAL','search','index','panic']], [['Ana','auth'],['Ben','payment']], 'DEBUG')

Expected Output: [['search:index','UNASSIGNED','1','1','p']]

Explanation: The search:index task has a fatal log, but no engineer has skill search and there is no wildcard engineer, so it is unassigned.

Input: ([], [['Ana','*']], 'INFO')

Expected Output: []

Explanation: Empty logs produce no investigation tasks.

Hints

  1. Group the remaining logs by correlation ID so you can reason about each request or trace independently.
  2. After you compute task statistics with hash maps, sort the tasks once and assign them greedily while tracking each engineer's current load.
Last updated: Jun 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Validate a Shopping Cart - DoorDash (medium)
  • Calculate Driver Payments - DoorDash (medium)
  • Implement Timeout Refund Workflow - DoorDash (medium)
  • Maximize Chef Assignment Profit - DoorDash (medium)
  • Compute Courier Delivery Pay - DoorDash (easy)