Using pandas only, compute banner and story metrics. Assume today is 2025-09-01 and 'last 7 days' means 2025-08-26 to 2025-09-01 inclusive. You are given two DataFrames: events and users. events columns: event_id (int), user_id (int), ts (UTC datetime), event_type (string in {'banner_impression','banner_click','story_view','story_post','back_navigation'}), banner_id (nullable int), story_id (nullable int), group_id (nullable int), dwell_ms (nullable int). users columns: user_id (int), is_bot (bool), signup_dt (date). A tiny sample of events (chronologically):
1, 10, 2025-08-31T12:00:00Z, banner_impression, 7, null, null, null
2, 10, 2025-08-31T12:00:01Z, banner_click, 7, null, null, 200
3, 10, 2025-08-31T12:00:02Z, back_navigation, null, null, null, null
4, 11, 2025-08-31T13:00:00Z, banner_impression, 7, null, null, null
5, 11, 2025-08-31T13:00:05Z, banner_click, 7, null, null, 1200
6, 12, 2025-08-29T09:00:00Z, story_post, null, 101, 5, null
Tasks:
-
Define accidental clicks as banner_click events with dwell_ms < 500 OR followed by a back_navigation by the same user within 2 seconds; exclude these from CTR.
-
Compute daily per-banner CTR over the last 7 days, excluding bots (users.is_bot = True) and accidental clicks. Output: date, banner_id, impressions, valid_clicks, ctr.
-
Compute user-level 7-day CTR (same exclusions) and the distribution by signup cohort (signup_dt week).
-
For the Group Story feature, compute per-user change in average session duration between pre-period 2025-08-18..2025-08-24 and post-period 2025-08-25..2025-09-01. Define sessions as gaps of >30 minutes between any two events by the same user. Output: user_id, pre_avg_minutes, post_avg_minutes, delta_minutes, stories_posted_change.
-
Performance: write vectorized pandas code (groupby, rolling/window, merge_asof) without Python loops; ensure stable dtypes and avoid SettingWithCopy warnings. Mention any indices you would set and how you would test correctness on the sample.