Detect sessions and gaps using SQL LEAD
Company: Apple
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write a single ANSI-SQL query that (a) assigns per-user session_ids when the gap between consecutive events exceeds 30 minutes, (b) computes session_start, session_end, event_count, session_length_seconds, and next_session_gap_seconds (time from this session_end to the next session_start for the same user), and (c) uses window functions including LEAD at least once.
Schema:
- events(user_id INT, ts TIMESTAMP, action VARCHAR)
Sample data:
+---------+---------------------+---------+
| user_id | ts | action |
+---------+---------------------+---------+
| 1 | 2025-08-01 09:00:00 | view |
| 1 | 2025-08-01 09:10:00 | click |
| 1 | 2025-08-01 10:00:00 | view |
| 2 | 2025-08-01 12:00:00 | view |
| 2 | 2025-08-01 12:20:00 | click |
| 2 | 2025-08-01 12:45:00 | purchase|
| 2 | 2025-08-01 14:00:00 | view |
+---------+---------------------+---------+
Requirements:
- Use LAG to detect new sessions and a running SUM to form session_ids per user.
- Use LEAD(ts) to compute next_event_ts and derive next_session_gap_seconds.
- Return columns: user_id, session_id, session_start, session_end, event_count, session_length_seconds, next_session_gap_seconds (NULL if no next session).
Quick Answer: This question evaluates proficiency in SQL time-series and event-stream manipulation, focusing on window-function techniques for sessionization, timestamp arithmetic, and aggregation to produce session-level metrics, targeting Data Manipulation (SQL/Python) competencies relevant to data scientist roles.