Describe your proudest project and dive deep
Company: Coupang
Role: Software Engineer
Category: Behavioral & Leadership
Difficulty: medium
Interview Round: Onsite
What is the project you are most proud of? Walk me through the problem you set out to solve, your specific role and responsibilities, the key technical and non-technical challenges, major design or prioritization trade-offs you made, how you evaluated success (metrics, impact, or lessons), and what you would do differently if you started again.
Quick Answer: This question evaluates leadership, ownership, technical communication, and impact measurement by requiring articulation of the problem, scope of responsibility, technical and non-technical challenges, trade-offs, and measurable outcomes.
Solution
# How to Structure a Top-Tier Answer (STAR-L)
Use a clear narrative:
- Situation: One-line context and why it mattered.
- Task: Your objective and constraints.
- Action: What you did (design, implementation, leadership) and why.
- Result: Quantified impact and how you measured it.
- Learning: What you’d do differently next time.
Keep it 3–5 minutes, with 1–2 technical depths you can drill into.
## Sample Answer (Software Engineer, e-commerce, real-time systems)
Situation
- During high-traffic promotions, our checkout oversold popular SKUs, causing order cancellations and support tickets. Baseline cancellation from oversell was ~3.2%, hurting customer trust and GMV.
Task
- My goal was to eliminate oversells without hurting checkout latency. Success criteria: reduce cancellations by >70%, keep P99 checkout reservation latency <30 ms, and roll out with zero downtime.
Action
- I led a 4-engineer effort across checkout, inventory, and platform. I drove the design, implemented the critical reservation path, and coordinated the migration.
- Design choices and trade-offs:
- Consistency vs latency: We chose an eventually consistent approach with atomic reservations and short TTLs over globally serializing DB writes, which could not meet latency SLOs.
- Architecture: Added a reservation layer using a Redis cluster. A single atomic Lua script performed reserve-if-available, ensuring no oversell. Reservations expired after 5 minutes unless confirmed by payment; on confirm, we decremented durable stock. We propagated events via Kafka to downstream services.
- Idempotency and correctness: Introduced request-level idempotency keys (userId+cartId+skuId) to handle retries and avoid double-reserves. On payment failure or timeout, we auto-released.
- Backpressure and fail-safes: Per-SKU rate limits and a fallback path that gracefully rejects reservations when a SKU is at risk of oversubscription, surfacing real-time availability to the UI.
- Migration: Dual-write and shadow-reads behind a feature flag. Synthetic load tests and a 10% canary before full ramp.
- Alternatives considered:
- DB row-level locks (SELECT ... FOR UPDATE) caused P99 ~120 ms under burst load and lock contention; rejected for latency and throughput.
- Distributed locks added complexity and failure modes; atomic Redis operations were simpler and faster.
Result
- A/B test (10% traffic) showed cancellation rate dropped from 3.2% to 0.6% for treated traffic (81% reduction). P95 reservation latency was 18 ms; P99 was 27 ms.
- Customer contacts for out-of-stock post-purchase decreased by 65%; GMV improved by ~0.8% during promotions due to fewer failed orders. On-call pages related to stock inconsistencies dropped 70%.
- We fully rolled out in two weeks after canary, with zero downtime and no customer-visible incidents.
Learning / What I’d Do Differently
- Start chaos testing and failure-injection earlier to validate behavior under node loss, clock skew, and network partitions.
- Unify idempotency key standards across checkout and payments sooner to reduce one-off edge cases.
- Instrument per-SKU saturation and reservation churn metrics from day one to speed diagnosis during spikes.
- Plan a broader comms and runbook dry-run with Ops; it reduced friction later, but earlier alignment would have accelerated rollout.
## Technical Deep Dive (for follow-ups)
Key data model and operations
- On reserve: Atomically check and decrement available units for a SKU, write a reservation record with TTL, return success.
- On commit: Confirm reservation, durably decrement stock, clear reservation.
- On expire: Auto-release via TTL or asynchronous sweeper.
Sizing and guardrails (small numeric examples)
- If peak reservation rate QPS is 1,000 and TTL is 300 seconds, expected outstanding reservations ≈ 1,000 × 300 = 300,000. If each reservation metadata is ~100 bytes, that’s ~30 MB per shard (plus overhead), informing cluster sizing.
- Guardrails during experiments: error rate <0.1%, P99 latency <30 ms, cancellation rate decreasing, no regressions in payment success.
Correctness and edge cases
- Handle retries with idempotency keys; ensure at-least-once events from Kafka are idempotent downstream.
- Prevent negative stock using atomic operations; avoid distributed locks where possible.
- Consider long-running payments by tuning TTL or allowing reservation extensions.
Common pitfalls to avoid
- Over-indexing on tech without a clear success metric.
- Migrating without dual-read/write and clear rollback.
- Ignoring per-SKU hotspots (the long tail behaves differently from top sellers).
## Template You Can Reuse
- Situation: One sentence on business/user pain and why it mattered.
- Task: Your objective, SLOs, and constraints.
- Action: 3–5 bullets on design, your contributions, and explicit trade-offs.
- Result: Before/after metrics (latency, errors, $ impact, customer outcomes).
- Learning: What you’d change and why.
This structure shows ownership, technical depth, decision-making, and measurable impact—what interviewers look for in behavioral and leadership rounds.