Debug using logs and allocate tasks
Company: DoorDash
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's skill in log-driven debugging, production incident investigation, and cross-functional task allocation, covering competencies such as structured and contextual logging, correlation identifiers, noise reduction, minimal reproducibility, feature-flagged isolation, triage coordination, hypothesis tracking, root-cause confirmation, regression prevention, and blameless postmortems. It is commonly asked to assess operational troubleshooting and teamwork under production uncertainty within the Coding & Algorithms category, testing practical application of debugging techniques alongside conceptual understanding of incident management and process-level coordination.
Constraints
- 0 <= len(logs) <= 200000
- Each log record has exactly 6 strings: [timestamp, correlation_id, level, service, feature, message]
- timestamp is a non-negative integer represented as a decimal string
- level and min_level are one of: DEBUG, INFO, WARN, ERROR, FATAL
- 0 <= len(engineers) <= 100
- Engineer names are unique
- service, feature, correlation_id, engineer name, and skill_service contain no colon characters except skill_service may be "*"
Examples
Input: ([['1','c1','INFO','auth','login','start'],['2','c1','ERROR','auth','login','db timeout'],['3','c2','WARN','payment','checkout','slow'],['4','c2','ERROR','payment','checkout','card declined'],['5','c2','ERROR','payment','checkout','retry failed'],['6','c3','INFO','auth','login','heartbeat']], [['Ana','auth'],['Ben','payment'],['Mia','*']], 'INFO')
Expected Output: [['payment:checkout','Ben','1','2','c2'],['auth:login','Ana','1','1','c1']]
Explanation: c2 has two error logs, so payment:checkout is highest priority and is assigned to payment specialist Ben. c1 creates auth:login and is assigned to Ana. The heartbeat log is ignored as noise.
Input: ([['10','a','DEBUG','api','v2','payload'],['11','a','ERROR','api','v2','null pointer'],['12','a','ERROR','db','query','cascade'],['13','b','ERROR','api','v2','timeout'],['14','c','FATAL','worker','sync','disk full'],['15','c','ERROR','worker','sync','healthcheck failed'],['16','d','ERROR','api','v2','retry succeeded after fail']], [['Ann','api'],['Wes','worker'],['Gus','*']], 'WARN')
Expected Output: [['api:v2','Ann','2','3','a'],['worker:sync','Wes','1','1','c']]
Explanation: Trace a's first error is api:v2 and it has two error logs total. Trace b also maps to api:v2. The healthcheck and retry-succeeded messages are ignored. api:v2 has 3 total errors, so it is first.
Input: ([['1','t1','ERROR','auth','login','x'],['2','t2','ERROR','auth','signup','x'],['3','t3','ERROR','auth','logout','x']], [['Ana','auth'],['Mia','*']], 'INFO')
Expected Output: [['auth:login','Ana','1','1','t1'],['auth:logout','Mia','1','1','t3'],['auth:signup','Ana','1','1','t2']]
Explanation: All tasks have equal priority, so they sort lexicographically by task key. Ana gets the first task due to exact-match tie-break, Mia gets the second because she has lower load, and Ana gets the third due to exact-match tie-break at equal load.
Input: ([['1','x','INFO','api','v1','start'],['2','x','ERROR','api','v1','heartbeat timeout'],['3','','ERROR','api','v1','missing correlation id']], [['A','*']], 'INFO')
Expected Output: []
Explanation: The only error with a correlation ID is ignored as heartbeat noise, and the other error has an empty correlation ID, so there are no problem traces.
Input: ([['1','p','FATAL','search','index','panic']], [['Ana','auth'],['Ben','payment']], 'DEBUG')
Expected Output: [['search:index','UNASSIGNED','1','1','p']]
Explanation: The search:index task has a fatal log, but no engineer has skill search and there is no wildcard engineer, so it is unassigned.
Input: ([], [['Ana','*']], 'INFO')
Expected Output: []
Explanation: Empty logs produce no investigation tasks.
Hints
- Group the remaining logs by correlation ID so you can reason about each request or trace independently.
- After you compute task statistics with hash maps, sort the tasks once and assign them greedily while tracking each engineer's current load.