Temporal Event Attribution In SQL
Asked of: Data Scientist
Last updated

What's being tested
Ability to translate temporal attribution rules into correct, efficient SQL: selecting the right lookback window, deduplicating many-to-many joins, and writing window or lateral queries that scale and handle edge cases like ties and timezones.
Core knowledge
- Last-touch vs first-touch vs multi-touch and common business lookback windows (7, 28, 90 days).
- Use lateral joins/APPLY or window functions (ROW_NUMBER() OVER PARTITION) to pick the nearest event.
- Prefer indexed predicates (user_id, event_timestamp) and sargable WHERE clauses for performance.
- Handle identical timestamps explicitly (tie-breaker keys such as event_id).
- Use RANGE vs ROWS frame semantics carefully for time-based windows.
- Avoid naive join-on-inequality (user_id AND event_time <= conv_time) without limiting—causes combinatorial explosion.
- Normalize timestamps to a single timezone and cast to appropriate SQL datetime types.
Worked example — Representative question: "Assign each conversion to the most recent prior event within 7 days"
First ask clarifying questions: what are the table keys (user_id?), are timestamps timezone-aware, and does "prior" include same-second events? Then choose an approach: for each conversion row, find the single event with event_time <= conv_time and event_time >= conv_time - INTERVAL '7 days'. Implement with a lateral join (SQL Server APPLY or Postgres LATERAL) selecting LIMIT 1 ordered by event_time DESC, or compute ROW_NUMBER() partitioned by conversion_id after joining and filter row_number=1. Ensure there's an index on (user_id, event_time) and decide tie-breaker (event_id) to make selection deterministic.
A common pitfall
The tempting but wrong approach is a plain inequality join (events.user_id = conv.user_id AND events.time BETWEEN conv.time - INTERVAL '7 days' AND conv.time) and then aggregating, which double-counts when multiple events qualify — giving inflated attributions. Another common mistake is using non-sargable expressions (e.g., applying DATE() to event_time in the WHERE) which prevents index use and causes slow full scans. Always limit to one event per conversion and preserve deterministic tie-breakers.
Further reading
- PostgreSQL documentation: Window Functions — explains ROW_NUMBER, RANGE/ROWS framing.
- BigQuery/Google Cloud docs: Analytic functions and LATERAL JOIN examples for time-based joins.
Related concepts
- Event Attribution, Deduplication, And Cohort SQL
- SQL Event Analytics
- SQL Joins, Deduplication, and Event Attribution
- SQL Event Log AnalyticsData Manipulation (SQL/Python)
- Temporal Event Processing And Interval AlgorithmsCoding & Algorithms
- SQL Analytics And Event Data ManipulationData Manipulation (SQL/Python)