PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates streaming data-processing and algorithmic design skills, specifically time-windowed aggregation, click deduplication, late-event handling, state management, and selection of data structures for per-campaign CTR computation.

  • Medium
  • Roblox
  • Coding & Algorithms
  • Data Scientist

Implement streaming CTR with deduplication

Company: Roblox

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Implement a Python function to compute streaming, per-campaign CTR over a sliding 24-hour window with click de-duplication and late-arriving events. Requirements: - Input: two iterators of dicts sorted non-decreasing by ts (ISO 8601 strings): impressions: {"imp_id", "campaign_id", "ts"} clicks: {"click_id", "imp_id", "campaign_id", "ts"} - Dedup: count at most one click per imp_id (keep earliest click). Ignore clicks whose imp_id was never seen. - Sliding window: maintain CTR for each campaign over the last 24 hours at every new event. Late events may arrive up to 10 minutes late; include them if they fall within the 24-hour window based on their ts. - Performance: O(log n) amortized updates per event, memory proportional to events within the last 24 hours. Provide big-O justification. - Output: an iterator of tuples (ts, campaign_id, impressions_in_window, dedup_clicks_in_window, ctr_in_window) emitted whenever the metric changes for that campaign. - Edge cases: multiple clicks referencing same imp_id, clock skew between streams, and daylight saving transitions. Provide a brief explanation of your data structures (e.g., heaps/queues + hash maps) and how you handle late events and expirations.

Quick Answer: This question evaluates streaming data-processing and algorithmic design skills, specifically time-windowed aggregation, click deduplication, late-event handling, state management, and selection of data structures for per-campaign CTR computation.

You are computing per-campaign click-through rate (CTR) for an ad system. Given a list of impression events and a list of click events, compute the CTR for each campaign over the trailing `window_hours` window ending at a query timestamp `query_ts`. Each impression is a dict `{"imp_id", "campaign_id", "ts"}` and each click is a dict `{"click_id", "imp_id", "campaign_id", "ts"}`, where `ts` is an ISO 8601 timestamp string. Rules: - An event is in the window if `window_start < ts <= query_ts`, where `window_start = query_ts - window_hours`. - Deduplicate clicks: count at most ONE click per `imp_id`, keeping the EARLIEST click. Whether a click is in the window is decided by that earliest click's timestamp. - Ignore any click whose `imp_id` never appeared in the impressions list. - CTR = dedup_clicks_in_window / impressions_in_window, rounded to 4 decimals (0.0 when there are no impressions). Return a list of tuples `(campaign_id, impressions_in_window, dedup_clicks_in_window, ctr_in_window)`, sorted ascending by `campaign_id`. Include a campaign if it has at least one impression OR one dedup click in the window. Data-structure note for the discussion: a production streaming version would keep, per campaign, a monotonic deque (or min-heap keyed by ts) of in-window impressions and dedup clicks plus a hash map `imp_id -> earliest_click_ts` for O(log n) amortized expiry/insertion; this offline version applies the same window/dedup logic in one pass.

Constraints

  • Timestamps are valid ISO 8601 strings, sorted non-decreasing by ts within each stream.
  • An event is in-window iff window_start < ts <= query_ts.
  • At most one click is counted per imp_id (earliest wins).
  • Clicks whose imp_id never appears in impressions are ignored.
  • CTR is rounded to 4 decimal places; 0.0 when impressions_in_window == 0.
  • Result is sorted ascending by campaign_id.

Examples

Input: ([{"imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:00:00+00:00"}, {"imp_id": "i2", "campaign_id": "A", "ts": "2026-01-01T11:00:00+00:00"}, {"imp_id": "i3", "campaign_id": "B", "ts": "2026-01-01T12:00:00+00:00"}], [{"click_id": "c1", "imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:05:00+00:00"}, {"click_id": "c2", "imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:06:00+00:00"}], "2026-01-01T13:00:00+00:00")

Expected Output: [('A', 2, 1, 0.5), ('B', 1, 0, 0.0)]

Explanation: Campaign A has 2 impressions and 2 clicks on i1; dedup keeps the earliest click only -> 1 click, CTR 1/2 = 0.5. Campaign B has 1 impression, no clicks -> CTR 0.0. All events fall within the 24h window ending 13:00.

Input: ([], [], "2026-01-01T13:00:00+00:00")

Expected Output: []

Explanation: No events at all -> empty result.

Input: ([{"imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:00:00+00:00"}], [{"click_id": "c1", "imp_id": "iX", "campaign_id": "A", "ts": "2026-01-01T10:05:00+00:00"}], "2026-01-01T11:00:00+00:00")

Expected Output: [('A', 1, 0, 0.0)]

Explanation: The click references imp_id 'iX' which was never seen as an impression, so it is ignored. Campaign A has 1 impression and 0 valid clicks -> CTR 0.0.

Input: ([{"imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:00:00+00:00"}, {"imp_id": "i2", "campaign_id": "A", "ts": "2026-01-03T10:00:00+00:00"}], [{"click_id": "c1", "imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:05:00+00:00"}, {"click_id": "c2", "imp_id": "i2", "campaign_id": "A", "ts": "2026-01-03T10:05:00+00:00"}], "2026-01-03T11:00:00+00:00")

Expected Output: [('A', 1, 1, 1.0)]

Explanation: Querying at 2026-01-03 11:00, the 24h window starts 2026-01-02 11:00. The Jan-01 impression/click are expired out of the window; only i2 and its click remain -> 1 impression, 1 click, CTR 1.0.

Input: ([{"imp_id": "i1", "campaign_id": "A", "ts": "2026-01-01T10:00:00+00:00"}, {"imp_id": "i2", "campaign_id": "A", "ts": "2026-01-01T10:10:00+00:00"}, {"imp_id": "i3", "campaign_id": "A", "ts": "2026-01-01T10:20:00+00:00"}], [{"click_id": "c1", "imp_id": "i2", "campaign_id": "A", "ts": "2026-01-01T10:15:00+00:00"}, {"click_id": "c2", "imp_id": "i3", "campaign_id": "A", "ts": "2026-01-01T10:25:00+00:00"}, {"click_id": "c3", "imp_id": "i3", "campaign_id": "A", "ts": "2026-01-01T10:22:00+00:00"}], "2026-01-01T11:00:00+00:00")

Expected Output: [('A', 3, 2, 0.6667)]

Explanation: 3 impressions on A. i2 has one click; i3 has two clicks (a later c2 and an earlier, late-arriving c3) -> dedup keeps the earliest (10:22) so it counts once. 2 dedup clicks / 3 impressions = 0.6667.

Hints

  1. Build a map imp_id -> campaign_id from ALL impressions so you can validate and attribute clicks, but only count impressions whose ts falls in the window.
  2. Deduplicate clicks by tracking the earliest timestamp per imp_id BEFORE applying the window filter to clicks.
  3. Decide a click's window membership by its earliest deduped timestamp, not by any later duplicate.
  4. A campaign appears in the output if it has any in-window impression OR any in-window dedup click; guard division by zero for CTR.
Last updated: Jun 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find Windows Containing a Target - Roblox (medium)
  • Implement Sliding-Window Rate Limiter - Roblox (medium)
  • Find target-heavy sliding windows - Roblox (medium)
  • Find most frequent call path in logs - Roblox (medium)
  • Track Highest-Earning Experience - Roblox (medium)