SQL/Python Joins, Aggregations, And Window Functions
Asked of: Data Scientist
Last updated

What's being tested
These questions test relational data manipulation: joining behavioral logs to entity metadata, filtering by time or action type, deduplicating at the right grain, and aggregating into user-, country-, job-, or continent-level metrics. Interviewers are probing whether you can translate metric definitions into correct SQL or pandas without double-counting or losing edge cases.
Patterns & templates
-
Join event logs to dimensions with
INNER JOINorLEFT JOIN; confirm whether missing metadata should drop rows or remain asNULL. -
Deduplicate before aggregating using
COUNT(DISTINCT col),drop_duplicates, or a CTE at the metric grain; avoid counting repeated views as unique article types. -
Conditional aggregation with
SUM(CASE WHEN action='apply' THEN 1 ELSE 0 END)orCOUNT(*) FILTER (WHERE ...)for views, applies, posters, or applicants. -
Window ranking via
ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY metric DESC, tie_breaker ASC)for top country, first post, or deterministic tie handling. -
Histogram construction by first computing per-entity values, then grouping those values: user → distinct article types → count of users per diversity bucket.
-
Percentage-of-group metrics use
metric / SUM(metric) OVER (PARTITION BY group); cast to decimal to avoid integer division inSQL. -
Python equivalent:
merge, boolean filters,groupby().agg(),nunique(),rank(method='first'), andvalue_counts()cover most variants inpandas.
Common pitfalls
Pitfall: Aggregating after a many-to-one or many-to-many join without checking grain can inflate counts, especially for views, applies, or article categories.
Pitfall: Using
RANK()when the prompt expects exactly one row per group; preferROW_NUMBER()with explicit tie-breakers.
Pitfall: Filtering the wrong table or wrong time column changes the metric definition; clarify whether the date applies to view time, post time, apply time, or metadata creation time.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Featured in interview prep guides
Practice questions
- Compute article-type diversity per user and histogramLinkedIn · Data Scientist · Technical Screen · easy
- Find top countries by population per continentLinkedIn · Data Scientist · Technical Screen · easy
- Analyze member video posting behavior by countryLinkedIn · Data Scientist · Technical Screen · easy
- Count Article Types ViewedLinkedIn · Data Scientist · Technical Screen · medium
- Count viewed article typesLinkedIn · Data Scientist · Technical Screen · medium
- Analyze video posting activityLinkedIn · Data Scientist · Technical Screen · medium
- Identify Top Contributors by Recent Post CountLinkedIn · Data Scientist · Technical Screen · Medium
- [SQL] Job Ad Metrics with Applicant FilterLinkedIn · Data Scientist · Technical Screen · Medium
Related concepts
- SQL And Python Data ManipulationData Manipulation (SQL/Python)
- SQL Window Functions And Analytical QueryingData Manipulation (SQL/Python)
- SQL/Python Data Manipulation And JoinsData Manipulation (SQL/Python)
- SQL Analytics Joins, Aggregations, And Windows
- SQL Window Functions And AnalyticsData Manipulation (SQL/Python)
- SQL Analytical Querying And Data ModelingData Manipulation (SQL/Python)