Choose tools for scalable distributed systems
Company: TikTok
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
You are asked systems/design-concepts questions based on real product scenarios. For each scenario below:
1) Identify the *main requirements* (latency, throughput, durability, consistency, availability, cost, query patterns).
2) Choose an appropriate tool/architecture (DB/cache/queue/search/stream processing/etc.).
3) Justify trade-offs and list at least 2 risks/pitfalls.
## Scenarios
### A) Caching and read scalability
You have a read-heavy service with hot keys and strict p99 latency. How would you use a cache (e.g., Redis) and what cache strategy would you pick (cache-aside, read-through, write-through, write-back)? How do you prevent cache stampede and handle invalidation?
### B) Concurrency control
A user can place an order and you must avoid overselling inventory under high concurrency.
- When is **optimistic locking** appropriate vs **pessimistic locking**?
- If you use Redis for coordination, how would you implement a correct distributed lock?
### C) Asynchronous processing
You need to process events (e.g., payments, notifications) reliably without slowing down the request path. When do you pick:
- a message queue
- a log-based streaming platform
- a task scheduler / cron
### D) Storage selection
Pick a primary storage solution and justify:
- relational DB vs key-value store vs document store
- when you would introduce sharding and what shard key you would choose
### E) Observability and performance
What metrics and dashboards would you define to prove the system scales? How would you debug rising tail latency?
Quick Answer: This question evaluates proficiency in designing scalable distributed systems, testing requirement analysis (latency, throughput, durability, consistency, availability, cost, query patterns) and tool selection across caching, databases, queues, stream processing, concurrency control, sharding, and observability.