Write SQL for shop visibility and activity metric
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Assume 'today' is 2025-09-01. Schema and tiny samples: 1) shops(shop_id INT, created_at DATE)
Sample:
shop_id | created_at
1 | 2025-08-25
2 | 2025-07-15
3 | 2025-08-30
4 | 2025-06-01
5 | 2025-08-05
2) listings(listing_id INT, shop_id INT, created_at DATE, is_visible TINYINT) -- is_visible is a 2025-09-01 snapshot
Sample:
listing_id | shop_id | created_at | is_visible
101 | 1 | 2025-08-26 | 1
102 | 1 | 2025-08-27 | 0
103 | 1 | 2025-08-29 | 1
104 | 2 | 2025-07-20 | 1
105 | 2 | 2025-07-25 | 1
106 | 3 | 2025-08-31 | 0
107 | 3 | 2025-08-31 | 0
108 | 4 | 2025-06-05 | 1
109 | 5 | 2025-08-10 | 1
3) shop_events(shop_id INT, event_date DATE, event_type VARCHAR, cnt INT) -- event_type in ('product_created','listing_updated','message_replied','order_placed')
Sample:
shop_id | event_date | event_type | cnt
1 | 2025-08-26 | product_created | 3
1 | 2025-08-29 | message_replied | 5
2 | 2025-08-27 | order_placed | 2
3 | 2025-08-31 | listing_updated | 4
3 | 2025-08-30 | order_placed | 1
4 | 2025-08-28 | product_created | 1
5 | 2025-08-21 | product_created | 2
5 | 2025-08-25 | message_replied | 1
Tasks: A) Write SQL to compute per-shop visibility_rate on 2025-09-01 = AVG(is_visible) over that shop’s listings; return only shops with at least 5 listings and rank by visibility_rate DESC, breaking ties by total listings DESC. B) Define a 'new shop' as created_at >= 2025-08-02 (last 30 days). Define activity_score over the window [2025-08-19, 2025-09-01] as 3*order_placed + 1*product_created + 0.5*listing_updated + 0.2*message_replied using counts from shop_events; shops with no events have score 0. Write SQL to output one row per shop with: shop_id, is_new, visibility_rate (from A; default NULL if <1 listing), activity_score. C) Using that output, write SQL to produce a 3-row summary: metric_name, new_shops_value, existing_shops_value, pct_diff for (i) mean activity_score, (ii) median activity_score (use an analytic function), and (iii) share_of_active_shops (fraction with activity_score > 0). D) Finally, list the top 5 new shops by activity_score whose visibility_rate < 0.5 to identify active-but-underexposed shops. Explain any assumptions you make about NULL vs 0 and how you’d prevent survivorship bias.
Quick Answer: This question evaluates SQL-based data manipulation and analytical skills, including aggregations, window functions, cohort comparisons, ranking, median calculation with analytic functions, and careful handling of NULLs versus zeros when constructing time-windowed metrics like visibility_rate and activity_score.