Implement a Python function to compute streaming, per-campaign CTR over a sliding 24-hour window with click de-duplication and late-arriving events.
Requirements:
-
Input: two iterators of dicts sorted non-decreasing by ts (ISO 8601 strings):
impressions: {"imp_id", "campaign_id", "ts"}
clicks: {"click_id", "imp_id", "campaign_id", "ts"}
-
Dedup: count at most one click per imp_id (keep earliest click). Ignore clicks whose imp_id was never seen.
-
Sliding window: maintain CTR for each campaign over the last 24 hours at every new event. Late events may arrive up to 10 minutes late; include them if they fall within the 24-hour window based on their ts.
-
Performance: O(log n) amortized updates per event, memory proportional to events within the last 24 hours. Provide big-O justification.
-
Output: an iterator of tuples (ts, campaign_id, impressions_in_window, dedup_clicks_in_window, ctr_in_window) emitted whenever the metric changes for that campaign.
-
Edge cases: multiple clicks referencing same imp_id, clock skew between streams, and daylight saving transitions.
Provide a brief explanation of your data structures (e.g., heaps/queues + hash maps) and how you handle late events and expirations.