This question evaluates a candidate's skills in designing large-scale distributed systems, focusing on near-real-time ingestion and aggregation, idempotency/deduplication, time-windowed (tumbling and sliding) analytics, hot-key sharding, storage and backfill strategies, and operational concerns including monitoring and privacy.
Build a service that ingests high-throughput client events and provides near-real-time aggregations of activity counts per user, device, and region. The system must support time-windowed queries (tumbling and sliding), deduplication/idempotency, hot-key sharding, and privacy-by-design. It should be resilient, observable, and support backfill/reprocessing.
Assume the service is multi-tenant and globally deployed with regional data residency. Reads should be near-real-time (seconds), writes are very high-throughput, and clients may be offline and sync later.
Login required