How to evaluate similar-listing notifications feature
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
## Case study (Marketplace product analytics)
**Context:** Circle is a US marketplace app for buying and selling second‑hand products. On a product listing page, a buyer can click **“send message”** to contact the seller. Each message sent counts as **one listing interaction**.
The team is considering (and then ships) a new feature on product listings:
- Buyers can opt into **reminders/notifications** such as “similar listings you may like.”
- When similar products become available, the buyer receives a notification.
### Part A — Should we build it?
How would you decide whether this is a good idea for the product? In your answer, cover:
- The user problem and hypothesis
- What data you would analyze *before* building (opportunity sizing)
- What success would look like and what could go wrong
- What MVP / rollout plan you would propose if you were uncertain
### Part B — It’s implemented. How do we measure impact?
The developers have shipped the functionality. How would you understand its impact and determine whether it is a successful feature?
Be specific about:
- **Primary success metric(s)** vs **diagnostic metrics** vs **guardrail metrics**
- Experiment or quasi-experiment design (unit of randomization, control, duration)
- Key pitfalls (selection bias from opt-in, notification fatigue, interference/network effects, seasonality)
- How you would interpret results and decide to iterate, roll out, or roll back
Quick Answer: This question evaluates a data scientist's product analytics and experimentation skills, including metric definition, causal inference, user segmentation, and measurement of engagement impact for a notification feature on a marketplace app.
Solution
### Part A — Decide whether it’s worth building
#### 1) Clarify the goal and articulate hypotheses
A marketplace feature must ultimately improve marketplace health (liquidity, match rate) without harming user experience.
**Candidate goal:** Increase buyer-to-seller connections and purchases by helping buyers discover relevant inventory when it appears.
**Example hypotheses:**
- **H1 (engagement):** Similar-listing notifications increase buyer re-engagement and listing interactions (messages) per buyer.
- **H2 (conversion):** Notifications increase downstream conversions (purchases / completed transactions) and/or reduce time-to-purchase.
- **H3 (retention):** Notifications increase 7/28-day buyer retention.
**Risks / counter-hypotheses:**
- **Fatigue / spam:** More notifications → higher mute/uninstall, lower NPS.
- **Adverse selection:** Only highly engaged users opt in; feature may not generalize.
- **Cannibalization:** Users might delay purchases waiting for “better” similar listings.
- **Marketplace interference:** Promoting some listings may reduce exposure for others (network effects / fairness concerns).
#### 2) Pre-build opportunity sizing (use existing data)
You want to estimate *headroom* and where the feature could matter most.
**Analyses (examples):**
- **Search + browse unmet demand:** Sessions where a buyer views many listings but sends 0 messages (high intent, low match).
- **Inventory arrival rate:** For common categories, how often “similar” items appear after a user views an item? If similar inventory is sparse, notifications won’t trigger enough to matter.
- **Time-to-message / time-to-purchase:** If buyers often return days later to message, reminders could accelerate actions.
- **Repeat interest patterns:** % of users who view the same category/keywords repeatedly over days.
- **Current notification baseline:** Existing push/email volume and opt-out rates—can we add more without harming?
**Back-of-the-envelope sizing:**
Let
-
\(N\) = daily users who view listings,
- \(p\) = fraction with high intent but no interaction,
- \(r\) = fraction who would opt in,
- \(t\) = expected notifications per opted-in user per day,
- \(c\) = incremental click-through-to-view rate,
- \(m\) = incremental message rate per notification-driven view.
Estimated incremental messages/day \(\approx N \cdot p \cdot r \cdot t \cdot c \cdot m\).
If this is tiny, deprioritize.
#### 3) Define “similar listing” and feasibility constraints
This is both product and data/ML:
- Similarity definition: category + price range + location radius + attributes (brand/size) + embedding.
- Cold start: new users and sparse categories.
- Latency + triggering: real-time vs batch; limits per user/day.
#### 4) MVP / rollout if uncertain
A good approach is to reduce build risk:
- **MVP:** Rule-based similarity (same category + price band + geo), opt-in on listing page.
- **Safeguards:** frequency caps (e.g., ≤2/day), quiet hours, easy unsubscribe.
- **Phased rollout:** internal → 1% → 10% → 50% with monitoring.
- **Pre-registered success criteria:** decide ahead of time what lift/guardrails are required.
---
### Part B — Measure impact after shipping
#### 1) Choose metric stack: primary, diagnostic, guardrails
Because “messages” is an intermediate metric, use a hierarchy.
**Primary (choose 1–2):**
- **Incremental purchases / GMV per active buyer** (best if reliably measured)
- If purchases are rare/delayed: **listing interactions (messages) per active buyer** as a proxy, but validate linkage to purchase.
**Diagnostic metrics (to explain the why):**
- Notification **deliveries**, **open rate**, **CTR to listing**, **view-to-message rate**
- **Time to next session** after a listing view
- Funnel: notification → listing view → message → purchase
- **Seller-side effects:** messages received per seller, response rate, conversion rate
**Guardrails (must not worsen):**
- Push/email opt-out rate, mute rate
- App uninstall rate, DAU drop among exposed users
- User complaints, support tickets
- Spam blocks / reporting
- Notification volume per user (distribution, not just mean)
- Marketplace fairness indicators (e.g., exposure concentration / Gini of impressions)
#### 2) Preferred approach: randomized controlled experiment (A/B test)
**Unit of randomization:** typically **user-level** (buyer). If you randomize at notification-event level you risk contamination.
**Treatment:** user is eligible for similar-listing notifications (and sees opt-in flow or default settings).
**Control:** no similar-listing notifications (or placebo messaging if needed).
**Key design choices:**
- **Opt-in selection bias:** If only treated users can opt in, the ITT (intent-to-treat) effect is clean, but the “opt-in user effect” is biased. Report:
- **ITT:** effect of offering/eligibility
- Optional: **TOT** via instrumental variables if needed (eligibility as instrument for opt-in), but explain assumptions.
- **Duration:** long enough to capture repeats and delayed purchases (often 2–4 weeks minimum), plus check novelty effects.
- **Power/MDE:** compute required sample size based on baseline purchase/message rate variance.
**Primary analysis:**
- Compare mean outcome per user over the test window.
- Use robust SEs; consider CUPED (pre-period outcome as covariate) to reduce variance.
#### 3) Pitfalls and how to handle them
- **Interference / network effects:** A treated buyer messaging a seller can change seller behavior and inventory dynamics; treatment may indirectly affect control users. Mitigations:
- Randomize by geo-market (cluster) for sensitivity analysis, or run a holdout geo.
- Measure seller-level spillovers.
- **Seasonality/holidays:** Use contemporaneous control; avoid pre/post without control.
- **Multiple testing:** Pre-register primary metric; adjust or clearly label exploratory metrics.
- **Notification fatigue over time:** Examine treatment effect by week (week 1 vs week 4), and by notification frequency buckets.
#### 4) If A/B isn’t possible: quasi-experimental alternatives
- **Difference-in-differences** using phased rollout across geos.
- **Interrupted time series** with control series.
- **Propensity matching** is weaker here due to opt-in bias; only use as supplementary.
#### 5) Decision rule and next steps
Define thresholds *beforehand*, e.g.:
- Roll out if **primary metric lift** is statistically and practically meaningful (e.g., +1–2% purchases per buyer or +X% messages) **and** guardrails do not degrade beyond limits (e.g., opt-out ≤ +0.2pp, uninstall not up).
- Iterate if engagement lifts but conversion doesn’t: refine similarity quality, notification timing, frequency caps.
- Roll back if guardrails worsen materially or lift is concentrated in a tiny segment with broad negative externalities.
**Segmentation for insights:**
- New vs existing buyers
- Categories (high-liquidity vs low-liquidity)
- Urban vs rural (inventory density)
- High-intent users (many views, no messages) vs casual browsers
Overall, you judge success by **incremental marketplace outcomes** (purchase/GMV, match rate), supported by a healthy notification funnel, with strong guardrail protection against fatigue and negative marketplace spillovers.