Implement sliding-window device anomaly
Company: PayPal
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of streaming algorithms, sliding-window state management, per-user data structures, and handling out-of-order events under strict time and space complexity constraints.
Constraints
- Events are given in non-decreasing timestamp order.
- Window length is 600 seconds (10 minutes), inclusive: [ts - 599, ts].
- Only events with success == True count toward the distinct-device total.
- Alert threshold is >= 4 distinct device_ids.
- Emit at most one alert per user per distinct window_end_ts.
- 0 <= len(events); timestamps fit in a 64-bit integer.
Examples
Input: ([],)
Expected Output: []
Explanation: Empty stream produces no alerts.
Input: ([('u1', 0, 'd1', True), ('u1', 100, 'd2', True), ('u1', 200, 'd3', True)],)
Expected Output: []
Explanation: Only 3 distinct devices in the window — below the threshold of 4.
Input: ([('u1', 0, 'd1', True), ('u1', 100, 'd2', True), ('u1', 200, 'd3', True), ('u1', 300, 'd4', True)],)
Expected Output: [('u1', 0, 300, 4)]
Explanation: The 4th distinct device at ts=300 hits the threshold; oldest event still in window is ts=0, so window_start_ts=0.
Input: ([('u1', 0, 'd1', True), ('u1', 100, 'd2', True), ('u1', 200, 'd3', True), ('u1', 300, 'd4', False), ('u1', 400, 'd4', True)],)
Expected Output: [('u1', 0, 400, 4)]
Explanation: The failed login of d4 at ts=300 is ignored; the successful d4 at ts=400 is the 4th distinct device and fires the alert.
Input: ([('u1', 0, 'd1', True), ('u1', 100, 'd2', True), ('u1', 200, 'd3', True), ('u1', 700, 'd4', True)],)
Expected Output: []
Explanation: At ts=700 the window is [101, 700]; d1 (ts=0) has been evicted, leaving only d2, d3, d4 = 3 distinct devices.
Input: ([('u1', 0, 'd1', True), ('u1', 0, 'd2', True), ('u1', 0, 'd3', True), ('u1', 0, 'd4', True), ('u1', 0, 'd1', True)],)
Expected Output: [('u1', 0, 0, 4)]
Explanation: Four distinct devices all at ts=0 fire one alert; the repeat of d1 (and the duplicate window_end_ts=0) does not fire a second alert.
Input: ([('u1', 0, 'd1', True), ('u2', 0, 'dA', True), ('u1', 50, 'd2', True), ('u2', 50, 'dB', True), ('u1', 100, 'd3', True), ('u2', 100, 'dC', True), ('u1', 150, 'd4', True), ('u2', 150, 'dD', True)],)
Expected Output: [('u1', 0, 150, 4), ('u2', 0, 150, 4)]
Explanation: Two users are tracked independently; each reaches 4 distinct devices at ts=150 and fires its own alert.
Input: ([('u1', 1000, 'd1', True), ('u1', 1100, 'd2', True), ('u1', 1200, 'd3', True), ('u1', 1300, 'd4', True), ('u1', 1400, 'd5', True)],)
Expected Output: [('u1', 1000, 1300, 4), ('u1', 1000, 1400, 5)]
Explanation: Threshold first met at ts=1300 (4 devices). A new device at ts=1400 keeps the whole window alive (span 400s < 600s) and fires again with distinct_devices=5 and a new window_end_ts.
Hints
- Keep one deque of (ts, device_id) per user. Append on each successful event, then pop from the front while the oldest timestamp is older than ts - 599.
- The window must include the current event, so window_start = ts - 600 + 1; the oldest surviving deque entry's ts is window_start_ts in the alert.
- To hit the amortized O(1)-O(log K) target, maintain a device_id -> count map alongside the deque: increment on append, decrement on eviction, and track distinct as the number of keys with count > 0 instead of rebuilding a set each time.
- Failed logins (success == False) are skipped entirely — they neither count nor enter the deque.