Window Functions, Cohorting, and Time Series SQL
Asked of: Data Scientist
Last updated

-
What it is Window functions compute per-row metrics across a related set of rows without collapsing them (rankings, running totals, lag/lead). Cohorting groups entities (e.g., users) by a start event and tracks outcomes over relative time. Time series SQL aggregates by time grains and handles gaps, seasonality, and rolling windows.
-
Why interviewers ask about it Product analytics lives in event tables; you’re expected to turn fuzzy retention questions into precise SQL that scales. At Meta-like companies, correctness under edge cases (partial periods, duplicates, timezone drift) and performance on large datasets separate strong candidates.
-
Core ideas to know
- OVER(PARTITION BY … ORDER BY …) defines the window; ROWS vs RANGE frames change running-total semantics and tie handling.
- Common analytics: row_number/rank, lag/lead, moving averages, cumulative sums, percentiles over ordered frames.
- Window functions can’t appear in WHERE/GROUP BY; filter via subqueries/CTEs (or QUALIFY in engines that support it).
- Cohorts require clear definitions: cohort event, return event, time grain, and “exact day” vs “within window.”
- Build clean bases: one signup row per user; one distinct activity per user-period to avoid double-counting.
- Time series: date_trunc or time_bucket for aggregation; gapfilling/interpolation/LOCF to fill missing periods; be explicit about time zones.
- Guardrails: exclude incomplete current periods; choose stable ORDER BY keys to make rankings reproducible.
-
A common pitfall Candidates write syntactically correct queries that answer the wrong retention question. They mix calendar months with “months since signup,” count events instead of users, or include incomplete current periods that depress late-cohort rates. Another frequent miss is relying on default window frames (RANGE) that aggregate tied rows unexpectedly, or trying to filter on a windowed column in WHERE instead of wrapping in a subquery. Small mistakes like these yield believable but inconsistent numbers across dashboards.
-
Further reading
- PostgreSQL docs — Window Functions: authoritative definitions, syntax, and frame behavior with examples. https://www.postgresql.org/docs/current/functions-window.html
- SQLPad — SQL Retention Queries: Cohorts, Return Windows, and Edge Cases: crisp breakdown of cohort definitions and the traps that break retention SQL. https://sqlpad.io/tutorial/sql-retention-queries-cohorts-return-windows-and-the-edge-cases-that-break-them/
- TimescaleDB docs — time_bucket_gapfill(): practical guidance for contiguous time buckets and gap filling in time series SQL. https://docs.timescale.com/api/latest/hyperfunctions/gapfilling/time_bucket_gapfill/
Related concepts
- Cohorts And Window Functions In SQL
- SQL Joins, Aggregations, And Window Functions
- SQL Joins, Aggregation, And Window Functions
- Window FunctionsData Manipulation (SQL/Python)
- SQL Window Functions And AnalyticsData Manipulation (SQL/Python)
- SQL Window Functions And Temporal JoinsData Manipulation (SQL/Python)