Write advanced SQL for sales support analytics
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write SQL for the following schema and tasks. Assume ANSI SQL with DATE_TRUNC and INTERVAL supported. Sample tables (minimal rows shown).
accounts
+------------+-----------+------------+---------+
| account_id | industry | size_bucket| region |
+------------+-----------+------------+---------+
| 100 | Software | MM | West |
| 101 | Healthcare| SMB | East |
| 102 | Manufacturing| ENT | East |
| 103 | Retail | MM | Central |
| 104 | Finance | SMB | Central |
+------------+-----------+------------+---------+
reps
+--------+------------+--------+---------+----------+
| rep_id | hire_date | segment| region | quota_usd|
+--------+------------+--------+---------+----------+
| 10 | 2024-09-01 | SMB | West | 500000 |
| 11 | 2024-12-15 | MM | East | 800000 |
| 12 | 2025-03-10 | SMB | Central | 400000 |
+--------+------------+--------+---------+----------+
deals
+---------+------------+--------+---------------------+---------------------+------------+--------+------------+---------+
| deal_id | account_id | rep_id | created_at | closed_at | stage | is_won | amount_usd | region |
+---------+------------+--------+---------------------+---------------------+------------+--------+------------+---------+
| 1 | 100 | 10 | 2025-04-05 10:00:00 | 2025-04-20 09:00:00 | Closed Won | 1 | 12000 | West |
| 2 | 100 | 10 | 2025-04-25 12:00:00 | 2025-05-10 08:00:00 | Closed Lost| 0 | 8000 | West |
| 3 | 101 | 11 | 2025-05-02 11:00:00 | 2025-06-01 16:00:00 | Closed Won | 1 | 5000 | East |
| 4 | 102 | 11 | 2025-06-15 09:00:00 | 2025-07-20 10:00:00 | Closed Lost| 0 | 20000 | East |
| 5 | 103 | 12 | 2025-06-10 14:00:00 | 2025-06-22 10:00:00 | Closed Won | 1 | 15000 | Central |
| 6 | 104 | 12 | 2025-07-01 08:30:00 | 2025-07-18 12:00:00 | Closed Lost| 0 | 0 | Central |
+---------+------------+--------+---------------------+---------------------+------------+--------+------------+---------+
touches (duplicates possible on same day/channel)
+----------+------------+--------+------------+---------+--------------------+
| touch_id | account_id | rep_id | touch_date | channel | is_primary_contact |
+----------+------------+--------+------------+---------+--------------------+
| 1 | 100 | 10 | 2025-01-10 | email | 1 |
| 2 | 100 | 10 | 2025-04-01 | call | 1 |
| 3 | 100 | 10 | 2025-04-01 | call | 1 |
| 4 | 101 | 11 | 2025-02-15 | webinar | 0 |
| 5 | 101 | 11 | 2025-04-28 | call | 1 |
| 6 | 102 | 11 | 2025-06-16 | email | 1 |
| 7 | 103 | 12 | 2025-06-01 | call | 1 |
| 8 | 104 | 12 | 2025-06-29 | email | 1 |
+----------+------------+--------+------------+---------+--------------------+
Tasks:
A) For Q2 2025 (2025-04-01 to 2025-06-30 inclusive), by rep_id compute: closed_deals, wins, win_rate, avg_deal_amount_usd, avg_sales_cycle_days. Include only deals with stage like 'Closed%' and amount_usd > 0. Return rows sorted by win_rate desc; break ties by higher avg_deal_amount_usd.
B) Deduplicate touches by keeping the smallest touch_id per (account_id, touch_date, channel). Define first_touch_channel_2025 as the channel of the earliest deduped touch for that account in 2025. Join to deals closed in 2025 and produce a table: first_touch_channel, segment (from reps), closed_deals, wins, segment_win_rate, and then a final row per channel showing the segment-weighted win_rate across segments (weights = that channel's closed_deals per segment). Provide a single SQL that outputs both per-segment and overall rows (e.g., via GROUPING SETS or UNION ALL).
C) Compute, for each rep, the Pearson correlation r across Q2 2025 weeks between prior-week call_count (channel='call') and current-week win_rate. Define week as DATE_TRUNC('week', date). Show SQL that: (1) builds weekly call counts; (2) builds weekly win_rate for deals closed that week; (3) lags call_count by 1 week within rep; (4) aggregates per rep over weeks to compute r using the closed-form formula r = (n*sum(xy)-sum(x)sum(y)) / sqrt((n*sum(x^2)-sum(x)^2)*(n*sum(y^2)-sum(y)^2)). Return rep_id and r, excluding reps with n < 5 weeks of overlapping data.
D) Data quality checks: (i) list deals where closed_at < created_at; (ii) accounts whose first touch occurs after their first close date; (iii) any deal with region differing from its account's region. Provide SQL for each.
E) Last-7-days activity as of today: Use today = '2025-09-01'. For the window 2025-08-26 to 2025-09-01 inclusive, list top 3 reps by count of distinct accounts touched via calls (deduped rule above). Break ties by higher total call count, then lower rep_id. Provide SQL.
Quick Answer: This question evaluates practical SQL data-manipulation skills such as joins, aggregation, window functions, time-based filtering, deduplication, and computation of sales KPIs for rep-level analytics.