Describe a time you were responsible for a storage/distributed-systems/infra component (or a similarly low-level, reliability-critical module).
The interviewer will probe beyond concepts into implementation details. Address:
-
What was the component and its role in the system (data path vs metadata path)?
-
What reliability/performance goals existed (SLO/SLA, durability, p99 latency)?
-
A specific incident or hard problem you faced (e.g., data inconsistency, corruption risk, replication lag, deadlock, performance regression).
-
How you debugged it (signals, logs/metrics/traces, reproduction, hypothesis testing).
-
What trade-offs you made and why.
-
How you drove the fix to completion (testing, rollout, backfill/repair, postmortem, prevention).
If you have limited direct storage experience, you may use an adjacent example (caching layer, messaging system, concurrency-heavy service), but be explicit about what was similar/different.