PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Amazon

Process real-time enter/exit events and actives

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competence in event-time processing, stateful windowed aggregations, deduplication, late and out-of-order event handling, and streaming-system design including state management and scalability.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Process real-time enter/exit events and actives

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

You receive a real-time stream of events with schema: user_id (str), channel (str), event_type ("enter"|"exit"), ts (UTC ISO timestamp). A user can ‘enter’ and ‘exit’ multiple times per channel; events may arrive up to 5 minutes late or out-of-order. Tasks: 1) Batch (pandas): Given a day of data, compute per-channel active_user_count for every 1-minute tumbling window, assuming missing exits imply an implicit exit at the next enter for the same channel or at day-end (state your assumption). Handle overlapping sessions and duplicate events robustly. Output columns: window_start, channel, active_user_count. 2) Top channels: For each minute, return the top 3 channels by active_user_count (ties broken lexicographically), and include dense rank per minute. 3) Streaming design: Outline a solution that produces the same outputs with event-time windows, 5-minute allowed lateness, and idempotent processing (exactly-once semantics if possible). Discuss state keys, watermarks, late-event handling, and how you would compact long-lived state. 4) Correctness and performance: Explain how you’d detect and repair clock skew, dedupe near-duplicates, and bound memory when the active set spikes. Provide big-O for steady state and worst case. 5) Edge cases: How do you reconcile an ‘exit’ with no prior ‘enter’, or overlapping sessions by the same user in the same channel?

Quick Answer: This question evaluates a candidate's competence in event-time processing, stateful windowed aggregations, deduplication, late and out-of-order event handling, and streaming-system design including state management and scalability.

Related Interview Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
2
0

You receive a real-time stream of events with schema: user_id (str), channel (str), event_type ("enter"|"exit"), ts (UTC ISO timestamp). A user can ‘enter’ and ‘exit’ multiple times per channel; events may arrive up to 5 minutes late or out-of-order. Tasks:

  1. Batch (pandas): Given a day of data, compute per-channel active_user_count for every 1-minute tumbling window, assuming missing exits imply an implicit exit at the next enter for the same channel or at day-end (state your assumption). Handle overlapping sessions and duplicate events robustly. Output columns: window_start, channel, active_user_count.
  2. Top channels: For each minute, return the top 3 channels by active_user_count (ties broken lexicographically), and include dense rank per minute.
  3. Streaming design: Outline a solution that produces the same outputs with event-time windows, 5-minute allowed lateness, and idempotent processing (exactly-once semantics if possible). Discuss state keys, watermarks, late-event handling, and how you would compact long-lived state.
  4. Correctness and performance: Explain how you’d detect and repair clock skew, dedupe near-duplicates, and bound memory when the active set spikes. Provide big-O for steady state and worst case.
  5. Edge cases: How do you reconcile an ‘exit’ with no prior ‘enter’, or overlapping sessions by the same user in the same channel?

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.