Describe challenges and give constructive feedback
Company: DoorDash
Role: Software Engineer
Category: Behavioral & Leadership
Difficulty: medium
Interview Round: Technical Screen
Describe a challenging project you drove: what specifically made it challenging, how you prioritized trade-offs, and the impact you achieved. Explain a time you handled a significant product issue (detection, triage, root cause analysis, stakeholder communication, and preventive actions). Share an example of giving and receiving constructive feedback, including how you prepared, delivered/received it, and what changed as a result.
Quick Answer: This question evaluates a software engineer's leadership, cross-team collaboration, incident management, prioritization, and constructive feedback skills by eliciting concrete examples of challenging projects, production incidents, and feedback interactions.
Solution
# How to approach and example answers (Software Engineer)
Use structured frameworks and quantify outcomes. Helpful tools:
- STAR/CAR for stories: Situation/Task → Action → Result.
- Incident model: Detect → Triage → RCA → Communicate → Prevent.
- Feedback model: SBI (Situation–Behavior–Impact) + Ask–Agree–Act.
- Quantify: Impact% = (baseline − after) / baseline × 100%.
## 1) Challenging project you drove
Approach
- Situation/Task: One sentence on problem and stakes (SLOs, timelines, customers).
- Challenges: Ambiguity, scale, cross‑team, legacy constraints, compliance.
- Trade‑offs: Name 2–3 with reasoning (e.g., speed vs. robustness, build vs. buy, scope vs. risk). Optionally mention a simple prioritization method (RICE/WSJF).
- Actions: What you personally led/decided. Call out rollout/validation (A/B, canary, load tests).
- Results: Concrete metrics and business/user impact.
Example (adapt to your experience)
- Situation: Our real‑time dispatch scoring service was hitting p95 latency ~2.4s during peaks, causing missed ETAs and higher drop‑offs before checkout. Execs set a target of <0.8s p95 in 8 weeks.
- What made it challenging:
- High reliability requirements (99.9% SLO) under bursty traffic.
- Cross‑team dependencies (infra, data science, mobile).
- Ambiguous data quality in upstream events; legacy code lacked clear ownership.
- Trade‑offs and prioritization:
- Scope vs. speed: We deferred two low‑lift ML features to v2; prioritized a lean rules engine plus a simpler model for v1. Used RICE to sequence work (highest Reach/Confidence first).
- Build vs. buy: Chose managed Kafka for event streaming over self‑managed to reduce operational risk and gain replay/backfill. Higher unit cost, lower execution risk.
- Consistency vs. availability: Accepted eventual consistency with idempotency keys and at‑least‑once semantics to keep the system available under partitions.
- Actions:
- Led the service rewrite to async, event‑driven architecture; added caching and batch scoring to cut hot path calls.
- Introduced circuit breakers and timeouts; implemented canary (5%→25%→100%) behind a feature flag.
- Built load tests (3× peak) and a rollback playbook; instrumented p50/p95/p99, error rate, QPS, and saturation.
- Results:
- p95 latency: 2.4s → 0.7s (−71%). Error rate: 1.8% → 0.3% (−83%).
- Orders meeting promised ETA: +4 percentage points; cart drop‑offs −6% at peak.
- Infra cost: −18% from caching and right‑sizing; on‑call pages dropped from 6/month to 1/month.
- Validated via 50/50 A/B for 2 weeks; guardrails on error budget protected user experience.
Pitfalls to avoid
- Over‑optimizing latency without SLOs or guardrails; rolling out globally without canaries.
- Deferring observability; skipping load tests representative of burst patterns.
## 2) Significant product issue you handled
Approach: Detect → Triage → RCA → Communicate → Prevent.
Example (adapt to your experience)
- Detection: Pager alert fired (payment 5xx > 2% for 5 minutes) plus a dip in auth success on the dashboard. Synthetic checks to the payment sandbox also failing.
- Triage:
- Declared a SEV‑1, opened an incident channel, assigned roles (IC lead, comms, scribe).
- Mitigations: Rolled back the latest release; toggled feature flag to route traffic to the previous payment path; temporarily scaled out the service; enabled circuit breaker to shed load to protect dependencies.
- Impact containment: Error rate dropped below 0.5% within 12 minutes.
- Root cause analysis:
- New gRPC client defaults had a 5s deadline, no retries; upstream partner latency spiked to ~2s, saturating our thread pool and causing request queue build‑up.
- A connection pool misconfiguration amplified the contention under burst traffic.
- Verified by replaying traffic in staging with fault injection (latency + timeouts) reproducing the pattern. Correlated deploy timestamp with error spikes in logs/traces.
- Communication:
- Posted status updates every 15 minutes to stakeholders (support/ops/leadership) with current impact, mitigation, ETA, and next update time.
- After recovery, sent a concise summary and scheduled a blameless postmortem within 48 hours.
- Preventive actions:
- Engineering: Set sane client defaults (timeouts, retries with jittered backoff, circuit breakers), increased and right‑sized connection pools, added bulkheads.
- Release guardrails: Mandatory canary with automated error/latency checks; typed/validated config; kill‑switch coverage for critical features.
- Observability: SLOs and error budgets for payment success; synthetic probes against partners; improved tracing on external calls.
- Process: Updated runbook; game‑day drills to rehearse failure modes.
Pitfalls to avoid
- Anchoring on the most recent change without verifying causality; skipping staged rollbacks.
- Under‑communicating (no clear next update time) or over‑promising ETAs.
## 3) Constructive feedback (giving and receiving)
Framework
- SBI: Situation → Behavior → Impact. Add Ask–Agree–Act: ask for perspective, agree on next steps, and set a follow‑up.
- For receiving: Seek specific examples, summarize what you heard, propose changes, and follow up with results.
Giving feedback example
- Preparation: Collected 3 examples of very large PRs (>1,000 LOC) that slowed reviews (median lead time 3.5 days). Clarified the shared goal: faster delivery with quality.
- Delivery (1:1, SBI + Ask):
- Situation: "In the last two sprints…"
- Behavior: "PRs combined refactors and features into single large changes."
- Impact: "Reviewers struggled to provide depth, and we missed two sprint goals."
- Ask: "What constraints led to this? Can we try smaller, topic‑focused PRs?"
- Result: Agreed on max ~300 LOC per PR, feature flags for incremental delivery, and a checklist. After two sprints: PR size −60%, review time −50%, on‑time delivery +30%. We shared the practice at eng sync.
Receiving feedback example
- Situation: My weekly status updates were too detailed, making it hard for stakeholders to spot risk.
- How I received it: Asked for examples of effective status formats and what "good" looks like for execs.
- Changes: Adopted a three‑bullet pyramid (what changed, risks/blocks, next steps), included one metric and one date per item, and moved deep detail to a link.
- Outcome: Stakeholder pings dropped noticeably; two cross‑team dependencies were flagged a week earlier than before; our project hit the next milestone on time.
General tips
- Be specific, timely, and behavior‑focused; separate person from behavior.
- For tough feedback, ask permission, state intent, and co‑create next steps with an explicit follow‑up date.
- Track outcomes to close the loop (e.g., review time, defect rate, stakeholder escalations).