This question evaluates a candidate's skill in log-driven debugging, production incident investigation, and cross-functional task allocation, covering competencies such as structured and contextual logging, correlation identifiers, noise reduction, minimal reproducibility, feature-flagged isolation, triage coordination, hypothesis tracking, root-cause confirmation, regression prevention, and blameless postmortems. It is commonly asked to assess operational troubleshooting and teamwork under production uncertainty within the Coding & Algorithms category, testing practical application of debugging techniques alongside conceptual understanding of incident management and process-level coordination.

You inherit a service where unit tests pass locally, but production logs show intermittent errors. Without a debugger, outline a step-by-step debugging plan that relies primarily on logs. Include: setting log levels, adding structured and contextual logging (correlation IDs), identifying signal from noisy logs, building a minimal reproducible case, and using feature flags or canaries to isolate changes. Explain how you would divide and assign investigation tasks across the team, establish a triage board, and track hypotheses to resolution. Walk through how you would confirm root cause, implement the fix, add regression tests/alerts, and run a blameless postmortem.