
You receive an unbounded stream of events: (event_time, user_id, category). Events may arrive up to 48 hours late and are not ordered by time. Design and implement an API to support: ingest(event), and query(window_start, window_end) that returns, for each hour in [start, end), the number of unique users and the 95th percentile of per-user interarrival delay, computed using event_time (not processing time). Constraints: (a) updates for late arrivals must retroactively correct prior aggregates; (b) each ingest must be amortized O(log n) or better; (c) memory must be bounded by O(W·C) where W is the 48-hour correction window and C is the number of active categories; (d) queries over H hours should be O(H + C·log W). Describe your data structures (e.g., time-bucketed hash + approximate distinct like HyperLogLog, plus order-stat sketches or t-digest), how you handle clock skew, deduplication, and idempotency, and provide test cases that demonstrate correctness when late data arrives exactly at the 48-hour boundary.