Compute unconnected 60s posts and reactions averages
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Given these tables and sample data, write SQL that answers both tasks below. Use today = 2025-09-01 and interpret "last/past 7 days" as the inclusive window 2025-08-26 through 2025-09-01. Schemas:
info_stream_views
- post_id BIGINT -- ID of post
- viewer_id BIGINT -- ID of user who viewed the post
- relationship STRING -- {Friend, Followee, Unconnected}
- duration DOUBLE -- seconds watched for that view event
- ds STRING -- 'YYYY-MM-DD' event date
post_reactions
- post_id INTEGER
- viewer_id INTEGER
- post_action STRING -- {like, comment, reshare}
- ds STRING -- 'YYYY-MM-DD'
Sample rows (small, illustrative):
info_stream_views
post_id | viewer_id | relationship | duration | ds
1 | 11 | Unconnected | 70.0 | 2025-08-31
1 | 10 | Friend | 45.0 | 2025-08-31
2 | 12 | Unconnected | 30.0 | 2025-08-30
2 | 13 | Friend | 120.5 | 2025-08-30
3 | 14 | Followee | 61.0 | 2025-08-26
3 | 11 | Unconnected | 62.0 | 2025-08-27
post_reactions
post_id | viewer_id | post_action | ds
1 | 11 | like | 2025-08-31
1 | 11 | comment | 2025-08-31
2 | 12 | like | 2025-08-30
2 | 13 | reshare | 2025-08-30
3 | 11 | like | 2025-08-27
Task A (distinct-post count): Return the number of distinct post_id that have at least one Unconnected view with duration > 60 seconds within 2025-08-26..2025-09-01 (inclusive). If multiple view rows exist for the same (post_id, viewer_id, ds), deduplicate by taking MAX(duration) for that key before applying the > 60s filter.
Task B (friend vs unconnected averages): Compute the average number of reactions per post attributable to Friend vs Unconnected viewers within the same 7-day window. Classify each reaction by joining post_reactions to info_stream_views on (post_id, viewer_id, ds); if multiple matches exist for a given (post_id, viewer_id, ds), prefer the row with the greatest duration. Count all reaction types equally (each = 1). For each relationship group in {Friend, Unconnected}, define the denominator as the number of distinct post_id that appeared in info_stream_views in the window; posts with zero reactions in a group should contribute 0 to that group's average. Return two rows: relationship, avg_reactions_per_post.
Edge cases to handle explicitly: mixed relationships across days, multiple views per user per post per day, reactions without a matching view (exclude these), and ds stored as STRING.
Quick Answer: This question evaluates data manipulation skills in SQL and Python, focusing on deduplication, join logic between event and reaction tables, date handling for string-typed dates, and aggregation to compute per-post metrics.