System Design: Two Production-Ready Services
A) Thread-Safe LRU Caching Service
Design and describe a production-ready, in-process LRU cache that can also be deployed as a networked cache service.
Specify:
-
API surface (in-process library and optional network endpoints): get/put/delete/computeIfAbsent, per-entry TTL, size limits, stats.
-
Concurrency control inside a process (e.g., fine-grained locks, lock striping, or lock-free techniques). Ensure eviction is correct under concurrent access.
-
TTL and capacity management (entry count limit, weighted by bytes, admission control).
-
Metrics and observability (hits/misses, evictions, latencies, saturation, traces, logs, debug endpoints).
-
Scaling across multiple instances:
-
Using Redis (or another remote cache): topology, client behavior, backpressure, failure handling.
-
Or client-side sharding with consistent hashing: hot key mitigation, replication/failover, rebalancing, consistency model.
-
Handling hot keys, backpressure, and data consistency semantics (cache-aside/write-through, stale reads, idempotency of updates, CAS).
Provide concrete choices and trade-offs.
B) Compute-Heavy "Find Transfer Combinations" API
Design an API that computes transfer route combinations (e.g., finding valid paths across payment rails/providers with constraints), with potential traffic spikes.
Specify:
-
Stateless vs. stateful design; avoid shared mutable state in workers and define per-request memory management.
-
Caching and precomputation strategies; request de-duplication (in-process and cross-instance).
-
Rate limiting, timeouts/deadlines, cancellation, idempotency.
-
Horizontal scaling and autoscaling policies; synchronous vs. async job modes.
-
Capacity estimates with clear assumptions and formulas.
-
A testing plan covering correctness, performance, reliability, and operability.
Make any minimal, explicit assumptions needed so the design is concrete.