SQL And Python Data Manipulation
Asked of: Data Scientist
Last updated
What's being tested
You’re being tested on analytics SQL/Python manipulation for product metrics: joining event tables, deduplicating entities, attributing outcomes to prior exposures, and aggregating by cohorts, geography, assignment, or time window. Meta interviewers are probing whether you can translate metric definitions into correct, auditable queries under messy event-log conditions.
Patterns & templates
-
Event attribution joins — join outcome events to prior exposure/assignment events using
`user_id`, timestamps, and bounded windows; avoid future-looking leakage. -
Deduplication with windows — use
`ROW_NUMBER()`OVER (PARTITION BY`entity_id`ORDER BY`ts`DESC) or earliest-event ordering for one record per user/session/object. -
Metric aggregation — compute
`COUNT(DISTINCT user_id)`,`SUM(revenue)`,`AVG(metric)`, and ratios like`conversions / assigned_users`with null-safe denominators. -
Cohort analysis — define cohort date from first qualifying event, then compute activity at
`D+1`,`D+7`, or weekly buckets using`DATE_DIFF`. -
Session-level rollups — aggregate impressions, clicks, calls, or comments to user/session before higher-level averages to avoid overweighting heavy users.
-
Conditional aggregation — use
`SUM(CASE WHEN condition THEN 1 ELSE 0 END)`or`COUNT_IF`to compute segmented metrics in one pass. -
Percentiles and bins — use
`PERCENTILE_CONT`,`APPROX_QUANTILES`,`NTILE`, or`CASE`bins for latency, position, frequency, and engagement distributions.
Common pitfalls
Pitfall: Joining raw impressions to raw clicks can multiply rows; aggregate or dedupe to the correct grain before computing
`CTR`.
Pitfall: Filtering after a
`LEFT JOIN`in the`WHERE`clause can silently turn it into an inner join and drop non-converters.
Pitfall: Mixing UTC event dates with local geography or cohort dates can shift users across days and bias retention or revenue metrics.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
- Compute ad revenue metrics by geography in SQLMeta · Data Scientist · Technical Screen · medium
- Compute cohort GMV and payer rate with edge casesMeta · Data Scientist · Technical Screen · Medium
- Build DiD dataset with SQLMeta · Data Scientist · Technical Screen · Medium
- Write SQL for visibility, calls, and cohort activityMeta · Data Scientist · HR Screen · Medium
- Write SQL for retention, conversion, and churnMeta · Data Scientist · Onsite · Medium
- Calculate posts per DAU by country todayMeta · Data Scientist · Technical Screen · Medium
- Produce dating profile funnel report by cohortMeta · Data Scientist · Onsite · Medium
- Write SQL for comment analyticsMeta · Data Scientist · Onsite · Medium
- Compute feed ad frequency and retention in SQLMeta · Data Scientist · Onsite · Medium
- Join datasets and compute conversion by assignmentMeta · Data Scientist · Technical Screen · Medium
Related concepts
- SQL Window Functions And Analytical QueryingData Manipulation (SQL/Python)
- SQL/Python Joins, Aggregations, And Window FunctionsData Manipulation (SQL/Python)
- SQL Analytical Querying And Data ModelingData Manipulation (SQL/Python)
- SQL Analytics And Event Data ManipulationData Manipulation (SQL/Python)
- SQL Product AnalyticsData Manipulation (SQL/Python)
- Python/Pandas Data ManipulationData Manipulation (SQL/Python)