Design an async job system and cache layer
Company: Salesforce
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
Design two systems. You can assume a large-scale production environment; focus on clear APIs, data models, scaling, reliability, and trade-offs.
# Part A) Design an asynchronous Job/Task system (service-oriented)
Design a service that lets clients submit background jobs and later query results.
## Requirements
- Clients can create jobs and poll/subscribe for status.
- Jobs move through well-defined states (e.g., pending/running/succeeded/failed/canceled).
- Support retries with exponential backoff.
- Ensure idempotency (no duplicate execution for the same logical request).
- Choose storage (RDBMS vs NoSQL) and justify.
## Discuss explicitly
- API design and state transitions
- Worker model / scheduling
- Retry + backoff, dead-letter handling
- Idempotency strategy
- Observability (metrics/logging/tracing)
---
# Part B) Design a high-throughput cache system (Redis-like / in-memory layer)
Design a caching layer in front of a database to reduce latency and increase throughput.
## Requirements
- Very high QPS, low latency reads, concurrent reads/writes.
- Cache key design, eviction (LRU/LFU/TTL).
- Multi-node scaling and sharding.
- Cache consistency and invalidation strategy; explain trade-offs vs DB consistency.
- Handle hot keys.
- High availability and performance under failures.
## Discuss explicitly
- Consistency model and write/read paths
- Invalidation strategies and failure modes
- Sharding/replication and rebalancing
- Hot key mitigation and rate limiting
- Capacity planning and SLOs
Quick Answer: This question evaluates system design competencies including asynchronous job orchestration, reliable background processing (state transitions, retries, idempotency), and high-throughput in-memory caching (eviction, sharding, consistency, hot-key mitigation) within the System Design domain, requiring both conceptual understanding and practical application details. It is commonly asked to assess an engineer's ability to define clear APIs and data models, reason about scaling, fault-tolerance, observability, and trade-offs between consistency, latency, and reliability in large-scale production environments.