Write SQL to analyze shop visibility
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You are given two tables. Use standard SQL (window functions allowed). Assume "today" is 2025-09-01 and that “currently visible” means a shop’s last state at or before 2025-09-01 23:59:59. Treat only changes where visibility flips (ignore consecutive duplicates).
Schema:
shop_visibility(profile_id BIGINT, date DATE, ts TIMESTAMP, visibility INT) — 0=hidden, 1=visible
shop_details(profile_id BIGINT, shop_category VARCHAR, visible_days BIGINT, creation_date DATE)
Small sample (illustrative only):
shop_visibility
+------------+------------+---------------------+------------+
| profile_id | date | ts | visibility |
+------------+------------+---------------------+------------+
| 101 | 2025-08-30 | 2025-08-30 09:00:00 | 1 |
| 101 | 2025-08-31 | 2025-08-31 11:00:00 | 0 |
| 101 | 2025-09-01 | 2025-09-01 10:15:00 | 1 |
| 102 | 2025-08-30 | 2025-08-30 08:00:00 | 1 |
| 102 | 2025-08-30 | 2025-08-30 12:00:00 | 1 |
| 102 | 2025-09-01 | 2025-09-01 18:00:00 | 0 |
| 103 | 2025-08-31 | 2025-08-31 09:00:00 | 0 |
| 103 | 2025-09-01 | 2025-09-01 20:00:00 | 0 |
| 104 | 2025-08-31 | 2025-08-31 05:00:00 | 0 |
| 104 | 2025-09-01 | 2025-09-01 09:00:00 | 1 |
| 104 | 2025-09-01 | 2025-09-01 12:30:00 | 0 |
| 104 | 2025-09-01 | 2025-09-01 16:45:00 | 1 |
| 105 | 2025-09-01 | 2025-09-01 07:00:00 | 1 |
| 106 | 2025-08-30 | 2025-08-30 06:00:00 | 0 |
| 106 | 2025-08-30 | 2025-08-30 14:00:00 | 1 |
| 106 | 2025-08-31 | 2025-08-31 14:05:00 | 0 |
| 106 | 2025-08-31 | 2025-08-31 14:10:00 | 1 |
| 106 | 2025-09-01 | 2025-09-01 02:00:00 | 0 |
| 106 | 2025-09-01 | 2025-09-01 22:00:00 | 1 |
shop_details
+------------+---------------+--------------+--------------+
| profile_id | shop_category | visible_days | creation_date|
+------------+---------------+--------------+--------------+
| 101 | Food | 180 | 2025-02-01 |
| 102 | Fashion | 120 | 2025-03-15 |
| 103 | Electronics | 30 | 2025-08-10 |
| 104 | Food | 75 | 2025-07-20 |
| 105 | Home | 200 | 2024-12-05 |
| 106 | Fashion | 10 | 2025-08-28 |
Write SQL for:
1) Top 10 profiles by number of visibility flips (count transitions where visibility != LAG(visibility) over ts). Break ties by smaller profile_id first.
2) Percentage of shops currently visible as of 2025-09-01 (numerator: profiles whose last state by ts on or before 2025-09-01 23:59:59 is 1; denominator: all profiles in shop_details). Return both percentage and counts.
3) Distribution of currently visible shops across categories: for each shop_category, return visible_count and its share among all currently visible shops.
4) Verify: “New users over time are visible far fewer days.” For each creation month, compute for every profile the fraction of days visible in its first 30 days since creation (using shop_visibility, deduping consecutive identical states, counting distinct dates with final visibility=1 per date), then return for each cohort: number of profiles, median fraction, and P75/P25. Include SQL that produces a cohort-level time series suitable for visualization. Briefly state how you would handle profiles created after 2025-08-02 (truncate the 30-day window) and how to test the trend statistically.
Quick Answer: This question evaluates proficiency in SQL-based time-series and event-data manipulation, including deduplication of consecutive states, window functions, aggregation, cohort analysis, and percentile/summary statistics.