SQL Analytical Querying
Asked of: Data Scientist
Last updated

What's being tested
Analytical querying for product data: turning raw event, user, and transaction tables into retention, revenue, ranking, overlap, and cohort metrics. Interviewers are probing whether you can write correct SQL/pandas under ambiguity: dedupe events, define time windows, handle ties, and explain metric edge cases.
Patterns & templates
-
Window functions like
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_ts)dedupe or select first/last events; always add deterministic tie-breakers. -
Cohort joins for retention: build an anchor cohort, join future activity on
user_id, constrain dates withBETWEEN, then aggregate by cohort day. -
Conditional aggregation with
SUM(CASE WHEN ... THEN 1 ELSE 0 END)orCOUNT(DISTINCT CASE WHEN ... THEN user_id END)for segmented metrics. -
Top-N ranking uses
RANK,DENSE_RANK, orROW_NUMBER; choose based on tie behavior and state the business implication. -
Set overlap / Jaccard: dedupe item-user pairs, self-join by entity, compute ; avoid double-counting symmetric pairs.
-
pandasgroupby pipelines mirrorSQL:drop_duplicates,groupby,agg,merge,rank,shift; watch index alignment and timezone-aware timestamps. -
Binary search over ordered logs is
O(log n)probes when records are date-sorted; inSQL, compose bounded queries instead of scanning all dates.
Common pitfalls
Pitfall: Counting events instead of users will inflate retention, active-user, and conversion metrics when users generate multiple rows.
Pitfall: Using local calendar dates without clarifying
UTCboundaries can shift next-day retention and revenue windows.
Pitfall: Forgetting tie semantics in top-category queries leads to inconsistent results; explicitly choose
RANK,DENSE_RANK, orROW_NUMBER.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Featured in interview prep guides
Practice questions
- Write Queries for Pinterest Engagement TasksPinterest · Data Scientist · Technical Screen · medium
- Write SQL for top categories and highly active usersPinterest · Data Scientist · Technical Screen · easy
- Write SQL to compute max-overlap listsPinterest · Data Scientist · Onsite · Medium
- Write windowed retention and ARPU SQLPinterest · Data Scientist · Onsite · Medium
- Write SQL and pandas for shopping eventsPinterest · Data Scientist · Technical Screen · Medium
- Write SQL to rank categories by impressionsPinterest · Data Scientist · Onsite · Medium
- Compute CTR by format for new US usersPinterest · Data Scientist · Technical Screen · Medium
- Implement Binary Search for Policy Violation LogsPinterest · Data Scientist · Onsite · Medium
- Analyze Global Engagement and Impressions with SQL QueriesPinterest · Data Scientist · Onsite · Medium
Related concepts
- SQL AnalyticsData Manipulation (SQL/Python)
- SQL Analytics Joins, Aggregations, And Windows
- SQL Product AnalyticsData Manipulation (SQL/Python)
- SQL Window Functions And Analytical QueryingData Manipulation (SQL/Python)
- SQL Analytics And Event Data ManipulationData Manipulation (SQL/Python)
- SQL Analytical Querying And Data ModelingData Manipulation (SQL/Python)