SQL/Python Joins, Aggregations, And Window Functions

What's being tested

These questions test relational data manipulation: joining behavioral logs to entity metadata, filtering by time or action type, deduplicating at the right grain, and aggregating into user-, country-, job-, or continent-level metrics. Interviewers are probing whether you can translate metric definitions into correct SQL or pandas without double-counting or losing edge cases.

Patterns & templates

Join event logs to dimensions with INNER JOIN or LEFT JOIN; confirm whether missing metadata should drop rows or remain as NULL.
Deduplicate before aggregating using COUNT(DISTINCT col), drop_duplicates, or a CTE at the metric grain; avoid counting repeated views as unique article types.
Conditional aggregation with SUM(CASE WHEN action='apply' THEN 1 ELSE 0 END) or COUNT(*) FILTER (WHERE ...) for views, applies, posters, or applicants.
Window ranking via ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY metric DESC, tie_breaker ASC) for top country, first post, or deterministic tie handling.
Histogram construction by first computing per-entity values, then grouping those values: user → distinct article types → count of users per diversity bucket.
Percentage-of-group metrics use metric / SUM(metric) OVER (PARTITION BY group); cast to decimal to avoid integer division in SQL.
Python equivalent: merge, boolean filters, groupby().agg(), nunique(), rank(method='first'), and value_counts() cover most variants in pandas.

Common pitfalls

Pitfall: Aggregating after a many-to-one or many-to-many join without checking grain can inflate counts, especially for views, applies, or article categories.

Pitfall: Using RANK() when the prompt expects exactly one row per group; prefer ROW_NUMBER() with explicit tie-breakers.

Pitfall: Filtering the wrong table or wrong time column changes the metric definition; clarify whether the date applies to view time, post time, apply time, or metadata creation time.

Practice these

The practice cards below cover the canonical variants — solve all of them and time yourself.

What's being tested

Patterns & templates

Common pitfalls

Practice these

Featured in interview prep guides

Practice questions

Related concepts