Design a Horizontally Scalable Distributed Counter Service
Context
You are designing a distributed counter service used concurrently by many clients. A counter is an integer identified by a key (e.g., user:123:likes). The system must support very high throughput, be resilient to failures, and provide strong semantics for updates.
Assume counters fit within signed 64-bit integers, and the service may run across multiple data center racks within a single region.
Requirements
-
Functional
-
Atomic increment and decrement operations per counter key.
-
Strong read-after-write consistency for a client immediately after a successful update.
-
Idempotent behavior under client retries (no double-apply).
-
Optional batch operations for efficiency.
-
Non-functional
-
Horizontal scalability across many nodes.
-
High availability within a region; tolerate node and network failures.
-
Monitoring to detect contention hotspots and safe rollback/mitigation strategies.
Deliverables
Describe:
-
Data model and APIs.
-
Concurrency control strategy (choose and justify among optimistic CAS, per-key sharding with single-writer, or distributed locks). Detail the read and write paths.
-
How you ensure idempotency and correctness under retries and failures.
-
Leader election and membership management.
-
Partition tolerance choices (CAP trade-offs), handling of clock skew, and discussion of exactly-once vs at-least-once semantics.
-
Monitoring for hotspots, and mitigation/rollback strategies when a single key becomes contended.