PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates a candidate's competency in time-based event attribution and temporal interval manipulation using pandas, covering interval merging, subscription-aligned revenue aggregation, anomaly flagging, and handling edge cases like overlapping or back-to-back subscriptions.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Transform event logs with subscription windows in pandas

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Using pandas, compute user-level subscription-aligned revenue and anomalies for September 2025. DataFrames: events(user_id:int, ts:UTC datetime, event:str in {'view','add_to_cart','purchase'}, product_id:int, price_usd:float), subs(user_id:int, plan:str, start_ts:UTC datetime, end_ts:UTC datetime or NaT). Requirements: (1) For each user, compute active_subscription_days in 2025-09 and total purchase revenue that occurred while the user was actively subscribed (purchase ts ∈ [start_ts, end_ts)); (2) Flag purchases outside any active window; (3) If a user has overlapping or back-to-back subscriptions, merge them into minimal disjoint half-open intervals before attribution; (4) Output two DataFrames: user_month_agg(user_id, month='2025-09', active_subscription_days:int, subscribed_purchase_revenue:float, out_of_window_purchases:int) and anomalies(user_id, ts, price_usd, reason='outside_window'|'overlap_fixed'); (5) Solve with vectorized operations (e.g., IntervalIndex, merge_asof, or interval trees) and discuss scalability to 100M events with limited RAM (chunking, dtype optimization, parquet scans). Small sample: subs: (1,'pro','2025-08-28T00:00Z','2025-09-10T00:00Z') (1,'pro','2025-09-10T00:00Z','2025-10-10T00:00Z') (2,'basic','2025-09-05T12:00Z',NaT) events: (1,'2025-09-09T22:00Z','purchase',101,19.99) (1,'2025-09-15T03:00Z','purchase',102,5.00) (2,'2025-09-01T01:00Z','purchase',103,9.99)

Quick Answer: This question evaluates a candidate's competency in time-based event attribution and temporal interval manipulation using pandas, covering interval merging, subscription-aligned revenue aggregation, anomaly flagging, and handling edge cases like overlapping or back-to-back subscriptions.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)