Define and query shop visibility
Company: Meta
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You are given the following schema. Use only the columns provided; do not introduce new fields or labels.
Tables and columns:
- shops(shop_id INT, shop_name TEXT, country TEXT)
- product_impressions(session_id TEXT, ts TIMESTAMP, product_id INT, shop_id INT, page TEXT, rank INT)
Sample data (ASCII):
shops
+---------+-----------+---------+
| shop_id | shop_name | country |
+---------+-----------+---------+
| 1 | Alpha | US |
| 2 | Beta | US |
| 3 | Gamma | CA |
| 4 | Delta | GB |
+---------+-----------+---------+
product_impressions
+------------+---------------------+------------+---------+-----------+------+
| session_id | ts | product_id | shop_id | page | rank |
+------------+---------------------+------------+---------+-----------+------+
| s1 | 2025-08-31 10:00:00 | 101 | 1 | home_feed | 1 |
| s1 | 2025-08-31 10:00:05 | 102 | 2 | home_feed | 7 |
| s2 | 2025-08-31 11:20:00 | 103 | 1 | home_feed | 3 |
| s3 | 2025-08-30 09:00:00 | 104 | 2 | home_feed | 2 |
| s3 | 2025-08-30 09:00:03 | 105 | 3 | home_feed | 6 |
| s4 | 2025-08-25 15:00:00 | 106 | 3 | home_feed | 5 |
| s5 | 2025-08-29 12:00:00 | 107 | 1 | search | 1 |
| s6 | 2025-08-31 14:00:00 | 108 | 2 | home_feed | 4 |
+------------+---------------------+------------+---------+-----------+------+
Define the daily Shop Visibility Score (SVS) for a shop as: among sessions that had at least one home_feed impression on that calendar date (UTC), the fraction of distinct sessions that saw ≥1 impression from that shop with rank <= 5 on home_feed the same date. A session can contribute to multiple shops' numerators if it saw multiple shops in top-5; the denominator is the distinct session count with any home_feed impression that date.
Tasks:
1) Write a single SQL query (CTEs allowed) that returns, for 2025-08-25 to 2025-08-31 inclusive, one row per (date, shop_id, SVS, numerator_sessions, denominator_sessions). Use only product_impressions and shops.
2) From your result, return the top 3 shops by 7-day average SVS; break ties by higher total home_feed impressions (same window).
3) Suppose rank can have gaps (e.g., no rank=1 for a given session). Explain how your query still correctly counts top-5 visibility or adjust it without adding columns.
4) Identify two failure modes where this SVS could be gamed or biased using only the given schema (e.g., repeated impressions within a session), and propose an SQL-only mitigation for each.
Quick Answer: This question evaluates a candidate's ability to design and compute a session‑level visibility metric using SQL and data manipulation techniques, covering joins, date/window handling, deduplication, aggregations, and metric validation.