PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Meta

How to evaluate a similar-listing notifications feature

Last updated: Jun 15, 2026

Quick Overview

A Meta Data Scientist analytics-and-experimentation interview question: a US C2C second-hand marketplace is considering opt-in “similar listings” notifications. You must (1) decide whether to build it — hypotheses, opportunity sizing, risks, MVP plan — and (2) design the post-launch measurement: a user-randomized A/B test with a primary/diagnostic/guardrail metric stack, handling of opt-in selection bias, and a clear launch/rollback rule.

  • easy
  • Meta
  • Analytics & Experimentation
  • Data Scientist

How to evaluate a similar-listing notifications feature

Company: Meta

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: easy

Interview Round: Technical Screen

##### Question You are a Data Scientist on a US C2C marketplace app (like Facebook Marketplace) where users buy and sell second-hand products. **Current product behavior** - Users browse product listings. - If a buyer is interested in a listing, they can click **“Send message”** to contact the seller. - Each message sent counts as **one listing interaction**. **Proposed feature** On a product listing, buyers can opt into **reminders/notifications** for **“similar listings you may like.”** When similar products become available, the buyer receives a notification. Answer the following: 1. **Pre-launch / decision framing.** How would you decide whether this feature is a good idea for the product? Cover: - The user problem and hypothesis you are testing. - What success metrics you would expect to move (and why), and how you would distinguish primary vs. diagnostic vs. guardrail metrics. - Key tradeoffs and risks (e.g., notification fatigue, adverse selection, cannibalization of search). - What data you would analyze *before* building to validate demand and size the opportunity (e.g., backtesting against historical logs). - What MVP / phased rollout plan you would propose if you were uncertain. 2. **Post-implementation impact evaluation.** Assume engineers have shipped the functionality (or it can be enabled for some users). How would you measure its impact and determine whether it is successful? Be specific about: - The recommended experiment or causal design, the unit of randomization, control vs. treatment, and duration. - Primary success metric(s) vs. secondary/diagnostic metrics vs. guardrail metrics. - Key pitfalls (opt-in selection bias, notification fatigue, interference/network effects, seasonality, attribution) and how you would handle them. - How you would interpret results and decide to iterate, roll out, or roll back.

Quick Answer: A Meta Data Scientist analytics-and-experimentation interview question: a US C2C second-hand marketplace is considering opt-in “similar listings” notifications. You must (1) decide whether to build it — hypotheses, opportunity sizing, risks, MVP plan — and (2) design the post-launch measurement: a user-randomized A/B test with a primary/diagnostic/guardrail metric stack, handling of opt-in selection bias, and a clear launch/rollback rule.

Solution

### Part 1 — Decide whether it’s worth building #### 1) Clarify the goal and articulate hypotheses A marketplace feature must ultimately improve marketplace health — liquidity and buyer–seller match rate — without harming the user experience. The candidate goal here: increase buyer-to-seller connections and purchases by helping buyers discover relevant inventory when it appears. Example hypotheses: - **H1 (engagement/liquidity):** Similar-listing notifications increase buyer re-engagement and listing interactions (messages) per buyer. - **H2 (conversion efficiency):** Notifications increase downstream conversions (purchases / completed transactions), reduce time-to-purchase, and/or raise message-per-view among high-intent sessions. - **H3 (retention):** Notifications bring users back, improving 7/28-day buyer retention. - **H4 (risk / counter-hypothesis):** Excess or irrelevant notifications increase mute/opt-out/uninstall and reduce long-run engagement (fatigue). Other risks to name: **adverse selection** (only highly engaged users opt in, so the effect may not generalize), **cannibalization** (users delay purchases waiting for a “better” similar listing, or notifications merely shift demand away from organic search without growing total transactions), and **marketplace interference** (promoting some listings reduces exposure for others — network effects / fairness concerns; more buyer messages can also overload sellers with low-quality inquiries). #### 2) Pre-build opportunity sizing (use existing data) The aim is to estimate *headroom* and where the feature could matter most before investing heavily. - **Unmet demand:** sessions where a buyer views many listings but sends 0 messages (high intent, low match). - **Inventory arrival rate:** for common categories, how often do “similar” items appear after a user views an item? If similar inventory is sparse, notifications won’t trigger enough to matter. - **Time-to-message / time-to-purchase:** if buyers often return days later to message, reminders could accelerate actions. - **Repeat-interest patterns:** % of users who view/save/search the same category or keywords repeatedly over days. - **Notification baseline:** existing push/email volume and opt-out rates — can we add more without harming? - **Backtest against logs:** identify users with repeated intent signals, then simulate — if we had notified them when similar inventory appeared, how often would there have been a plausible “match”? Evaluate a relevance proxy offline (e.g., precision@k using historic co-click / co-message patterns). **Back-of-the-envelope sizing.** Let N = daily users who view listings, p = fraction with high intent but no interaction, r = fraction who would opt in, t = expected notifications per opted-in user per day, c = incremental click-through-to-view rate, and m = incremental message rate per notification-driven view. Then estimated incremental messages/day ≈ N · p · r · t · c · m. If this is tiny, deprioritize. #### 3) Define “similar listing” and feasibility constraints This is both a product and a data/ML problem: - **Similarity definition:** category + price band + location radius + attributes (brand/size) + embeddings. - **Cold start:** new users and sparse categories. - **Latency and triggering:** real-time vs. batch; per-user/day caps. #### 4) MVP / rollout if uncertain - **MVP:** rule-based similarity (same category + price band + geo), opt-in on the listing page. - **Safeguards:** frequency caps (e.g., ≤2/day), quiet hours, easy unsubscribe. - **Phased rollout:** internal → 1% → 10% → 50% with monitoring. - **Pre-registered success criteria:** decide ahead of time what lift and guardrail bounds are required. --- ### Part 2 — Measure impact after shipping #### 1) Metric stack: primary, diagnostic, guardrails Because “messages” is an intermediate metric, use a hierarchy rather than optimizing CTR alone (CTR can rise while marketplace health falls if notifications are spammy). **Primary (pick 1–2, pre-registered):** - **Incremental purchases / GMV per active (or eligible) buyer** — best if reliably measured. - If purchases are rare or delayed: **listing interactions (messages) per eligible user** over a fixed window (e.g., 7 days) as a proxy, validated against downstream purchase. A robustness option: count only messages in threads that pass an intent threshold (e.g., seller reply), to avoid rewarding low-quality inquiries. **Diagnostic / secondary (explain the “why”):** - Notification deliveries, open rate, CTR to listing, view-to-message rate. - Funnel: notification → listing view → message → seller reply → purchase. - Time-to-next-session after a listing view; sessions per user. - Search-usage change (does the feature complement or cannibalize search?). - Seller-side effects: messages received per seller, response rate, conversion rate. **Guardrails (must not worsen):** - Notification opt-out / settings-disable rate, mute rate. - App uninstall rate; DAU among exposed users. - Spam/report/block rate; support tickets. - Notification volume per user (the distribution, not just the mean). - Seller burden (response rate, seller churn) and marketplace fairness (e.g., exposure concentration / Gini of impressions). #### 2) Preferred approach: randomized controlled experiment (A/B test) **Unit of randomization:** the **user** (buyer), to avoid cross-session/device contamination. - **Control:** no similar-listing notifications (feature hidden, or placebo messaging if needed). - **Treatment:** feature enabled and notifications sent. - **Eligibility / denominator:** define it clearly — e.g., users who viewed ≥ N listings in a category, or messaged a seller but did not transact, or saved items. **Handling opt-in selection bias.** If the user must opt in, do **not** naively compare opt-in vs. non-opt-in users — that confounds the feature with user intent. Instead: - **Encouragement design:** randomize who *sees* the opt-in prompt (or who is eligible), and measure the **ITT (intent-to-treat)** effect of offering the feature. This is the clean primary readout. - Optionally recover the **TOT** (effect on those who actually opt in) via instrumental variables, using eligibility/prompt as the instrument for opt-in — stating the exclusion-restriction assumptions explicitly. - Alternatively, randomize **notification sending** among already-opted-in users — cleaner for measuring notification value, but it does not measure the value of the opt-in UI itself. **Duration:** long enough to capture repeat visits and delayed purchases (typically 2–4 weeks minimum) and to see past novelty effects. **Power / MDE:** size the test from the primary-metric variance. “Messages per user” is often zero-inflated, so consider a longer window, stratification by baseline activity, and **CUPED** (use pre-period messaging as a covariate) to cut variance. Use robust standard errors and compare per-user outcomes over the window. **Attribution:** rely on **user-level totals** (messages/purchases per user) to capture net lift; report notification-driven sessions only as interpretive color. Avoid crediting outcomes purely on last-click. #### 3) Pitfalls and how to handle them - **Interference / network effects:** a treated buyer messaging a seller changes seller behavior and inventory, which can spill over to control users. Mitigate with a **cluster-randomized (geo-market) sensitivity arm** or a holdout geo, and measure seller-level spillovers. - **Seasonality / holidays:** always use a contemporaneous control; avoid pre/post without a control group. - **Multiple testing:** pre-register the primary metric; adjust or clearly label exploratory metrics. - **Fatigue over time:** examine the treatment effect by week (week 1 vs. week 4) and by notification-frequency bucket. #### 4) If a clean A/B test isn’t possible Fall back to quasi-experiments, naming residual confounding: - **Difference-in-differences** with a staggered rollout across geos/platforms/time. - **Interrupted time series** with a control series. - **Regression discontinuity** if notifications trigger above a threshold (e.g., a saved-search count). - **Propensity matching** only as supplementary — it is weak here because of opt-in bias. #### 5) Decision rule and segmentation Set thresholds beforehand, e.g.: roll out if the **primary-metric lift** is statistically and practically meaningful (e.g., +1–2% purchases per buyer, or +X% messages) **and** guardrails stay within bounds (e.g., opt-out ≤ +0.2pp, uninstall not up, seller reply rate stable). Common readouts: - **CTR up, messages/purchases flat:** clickbait / low-intent notifications — check view-to-message and seller reply. - **Messages up, seller reply down / reports up:** low-quality inquiries — refine relevance, add friction (e.g., saved search), or cap frequency. - **Short-term lift, long-term retention decline:** fatigue — enforce caps, personalization, snooze, category controls. - **Heterogeneous effects:** segment by category supply density, price tier, intent (new vs. returning), and geography (urban vs. rural inventory density); roll out only to net-positive segments. Overall, success is judged by **incremental marketplace outcomes** (purchases/GMV, match rate), supported by a healthy notification funnel and strong guardrail protection against fatigue and negative marketplace spillovers.

Explanation

Rubric: the strongest answers (1) tie success to marketplace health (liquidity/GMV/match rate) rather than CTR, (2) lay out a primary/diagnostic/guardrail metric hierarchy, (3) propose a user-randomized A/B test and explicitly defuse opt-in selection bias via an encouragement/ITT design, and (4) name marketplace-specific threats — interference/network effects, fatigue, cannibalization, seasonality — with concrete mitigations and a pre-registered decision rule.

Related Interview Questions

  • Measure scheduled posts feature success - Meta (medium)
  • Estimate ads ranking revenue impact - Meta (medium)
  • How should you evaluate unconnected content? - Meta (medium)
  • Should WhatsApp launch group calls? - Meta (medium)
  • How would you grow Meta products? - Meta (medium)
Meta logo
Meta
Jan 17, 2026, 12:00 AM
Data Scientist
Technical Screen
Analytics & Experimentation
85
0
Question

You are a Data Scientist on a US C2C marketplace app (like Facebook Marketplace) where users buy and sell second-hand products.

Current product behavior

  • Users browse product listings.
  • If a buyer is interested in a listing, they can click “Send message” to contact the seller.
  • Each message sent counts as one listing interaction .

Proposed feature On a product listing, buyers can opt into reminders/notifications for “similar listings you may like.” When similar products become available, the buyer receives a notification.

Answer the following:

  1. Pre-launch / decision framing. How would you decide whether this feature is a good idea for the product? Cover:
    • The user problem and hypothesis you are testing.
    • What success metrics you would expect to move (and why), and how you would distinguish primary vs. diagnostic vs. guardrail metrics.
    • Key tradeoffs and risks (e.g., notification fatigue, adverse selection, cannibalization of search).
    • What data you would analyze before building to validate demand and size the opportunity (e.g., backtesting against historical logs).
    • What MVP / phased rollout plan you would propose if you were uncertain.
  2. Post-implementation impact evaluation. Assume engineers have shipped the functionality (or it can be enabled for some users). How would you measure its impact and determine whether it is successful? Be specific about:
    • The recommended experiment or causal design, the unit of randomization, control vs. treatment, and duration.
    • Primary success metric(s) vs. secondary/diagnostic metrics vs. guardrail metrics.
    • Key pitfalls (opt-in selection bias, notification fatigue, interference/network effects, seasonality, attribution) and how you would handle them.
    • How you would interpret results and decide to iterate, roll out, or roll back.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Meta•More Data Scientist•Meta Data Scientist•Meta Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.