Ratio and Session Metrics in SQL
Asked of: Data Scientist
Last updated

-
What it is Ratio metrics divide one count by another (e.g., click-through rate = clicks/impressions; conversion rate = orders/sessions). Session metrics summarize activity within inactivity-bounded “visits” (e.g., sessions per user, events per session, session duration). In SQL, you typically construct sessions from event logs, then compute ratios over consistent denominators. (mode.com)
-
Why interviewers ask about it Teams like Meta rely on accurate ratios and sessions to evaluate product changes, funnels, and experiments. They want to see you translate messy event streams into defensible metrics, choose the right weighting (per-user vs global), and avoid denominator shifts that can invert conclusions.
-
Core ideas to know
- Define numerator/denominator precisely; apply identical filters to both to avoid inflated or deflated ratios.
- Average-of-ratios vs ratio-of-averages: choose weighting deliberately; per-user averages often behave better for CTR-like metrics. (microsoft.com)
- Sessionization pattern: LAG previous timestamp per user, flag gaps over threshold, cumulative SUM to assign session_id. (mode.com)
- Pick a sensible inactivity threshold; many tools default to 30 minutes, but tune to your product. (support.google.com)
- Use event time with a consistent timezone; beware DST/midnight boundaries when bucketing sessions and days.
- De-duplicate events and handle late arrivals; make sessionization idempotent in pipelines.
- In experiments, denominator/population changes can flip ratios (Simpson’s paradox); segment definitions must be invariant to treatment. (microsoft.com)
-
A common pitfall Candidates often filter only the numerator (e.g., clicks where country = US) but forget to apply the same filter to the denominator (impressions), producing a misleading CTR. Others compute ratio-of-sums when stakeholders wanted per-user weighting, or mix session definitions across platforms (e.g., SQL sessions vs GA4’s defaults), causing double-counting and broken comparisons. In experiments, a feature that increases exposures can drop CTR even as total clicks rise; inspecting numerator and denominator separately prevents misreads. (microsoft.com)
-
Further reading
- Mode: Finding User Sessions with SQL — Clear, practical LAG + running-sum pattern for sessionization without joins. (mode.com)
- Microsoft Research: A Dirty Dozen Metric Pitfalls — Explains ratio pitfalls, average-of-ratios vs ratio-of-averages, and denominator shifts in experiments. (microsoft.com)
- Google Analytics Help: GA4 Sessions — Authoritative defaults and configuration for session timeouts; helpful for aligning SQL with tool-defined sessions. (support.google.com)