Notifications And Push Notification Analytics
Asked of: Data Scientist
Last updated

What's being tested
Interviewers are probing whether you can evaluate a notification product as a causal inference and measurement design problem, not just as a dashboarding exercise. Strong answers define who is eligible, what counts as exposure, what success means, and how to separate incremental value from users who were already highly engaged. Meta cares because notifications can increase marketplace liquidity, message responses, and retention, but they can also create fatigue, opt-outs, spam reports, and long-term engagement decay. The interviewer is looking for disciplined tradeoffs: growth versus user experience, short-term clicks versus durable marketplace outcomes, and individual-level effects versus network or marketplace spillovers.
Core knowledge
-
Notification funnels should be decomposed into eligibility, send, delivery, impression/open, click, landing-page engagement, downstream action, and long-term retention. For marketplace, downstream metrics might include
`listing_view`,`save`,`seller_message`,`offer_sent`,`purchase_intent`, or`transaction_proxy`, not just`notification_click_rate`. -
Primary metrics should reflect the product’s intended causal mechanism. For similar-listing alerts, a better primary metric than
`CTR`may be incremental`qualified_listing_views_per_user`or`buyer_seller_message_threads_per_eligible_user`, because clickbait notifications can raise clicks while lowering marketplace quality or trust. -
Guardrail metrics are essential for push notifications because the treatment imposes attention costs. Common guardrails include
`push_opt_out_rate`,`notification_disable_rate`,`app_uninstall_rate`,`hide_report_rate`,`negative_feedback_rate`,`session_depth`,`7d_retention`, and total notification volume per user. -
Randomization unit is usually the user for independent notification eligibility, but cluster randomization may be needed when users interact, share devices, belong to households, or participate in marketplaces with supply-demand interference. Randomizing notifications at the event level risks cross-contamination and confusing user experience.
-
Eligibility definition must be fixed before analysis: for example, users who viewed or saved a marketplace item in the past 7 days, have push permissions enabled, and have at least one similar listing available. Analyze both
`intent-to-treat`over eligible randomized users and treatment-on-treated for exposed users, with causal caveats. -
Power and MDE should be discussed at the user level, not notification level, because multiple sends to the same person are correlated. A basic minimum detectable effect is approximately
for equal-sized arms; clustered designs inflate variance. -
Clustered experiments require the design effect:
where is average cluster size and is intracluster correlation. If households, social clusters, or geographic markets have and , effective sample size drops by roughly half. -
CUPED can reduce variance using pre-experiment covariates, especially historical marketplace engagement or prior notification responsiveness. The adjusted metric is , where ; it helps most when pre-period and post-period behavior are highly correlated.
-
Multiple testing matters when slicing by country, platform, buyer/seller role, notification type, or engagement cohort. Pre-register a primary metric and use corrections like Holm-Bonferroni or control false discovery rate with Benjamini-Hochberg for exploratory subgroup analysis.
-
Heterogeneous treatment effects are often central for notifications. New users may need helpful prompts, while power users may experience fatigue. Segment by notification permission status, historical open rate, marketplace intent, inventory density, platform, and prior mute/negative feedback behavior.
-
Interference and cannibalization are common. Similar-listing notifications may shift views from organic feed, search, saved items, or other notifications rather than create new demand. Measure incremental total marketplace engagement, not only engagement attributable to the new notification surface.
-
Unread-rate analysis should be user-centered. For multi-account or multi-device users, compute metrics such as
`unread_notifications_per_user`,`users_with_unread_rate_gt_50pct`, or bucketed account counts carefully; otherwise heavy users dominate averages and mask whether the feature worsens notification overload.
Worked example
For “How to evaluate similar-listing notifications feature,” start by clarifying the product goal: are we trying to increase buyer discovery, accelerate marketplace transactions, or re-engage users who showed shopping intent? Then define the eligible population: users who viewed or saved an item, have notification permissions, and can be matched to available similar listings within a time window. A strong answer would organize around four pillars: metric hierarchy, experiment design, segmentation, and risk monitoring. The primary metric could be `incremental_qualified_listing_views_per_eligible_user` or `buyer_seller_message_threads_per_user`, with secondary metrics like `notification_open_rate`, `save_rate`, and `return_sessions`. Guardrails should include `push_opt_out_rate`, `notification_settings_disable_rate`, `hide_report_rate`, total notifications received, and `7d_retention`.
The experiment would likely randomize at the user level: treatment users can receive similar-listing pushes, while control users continue with existing notification policy. You would analyze `ITT` first to preserve randomization, then separately inspect exposed users to understand mechanism. A key tradeoff is choosing `CTR` versus downstream marketplace actions: `CTR` is sensitive and fast, but it can reward low-quality or overly frequent notifications, so it should not be the sole success metric. You would also check cannibalization by comparing total marketplace views and messages, not just clicks from this notification. Close by saying that with more time, you would estimate heterogeneous effects by marketplace intent, notification sensitivity, and inventory density, then use those insights for targeted rollout rather than a blanket launch.
A second angle
For “Design a clustered notification experiment with guardrails,” the same evaluation logic applies, but independence assumptions become the main constraint. Instead of randomizing individual users, you may randomize clusters such as households, social graph components, geographic markets, or seller-buyer communities to reduce spillovers. The analysis must account for intracluster correlation using cluster-robust standard errors, cluster-level aggregation, or hierarchical models. The power calculation should use the design effect, because 1 million users in large correlated clusters may behave like far fewer independent observations. Guardrails become especially important because cluster-level treatment may change marketplace liquidity, seller response times, or buyer competition in ways that affect untreated users.
Common pitfalls
Pitfall: Optimizing for
`notification_click_rate`alone.
This is the classic analytical mistake. A notification can produce high `CTR` by being urgent, vague, or frequent while increasing opt-outs and reducing trust. A stronger answer ties success to incremental downstream value and includes fatigue guardrails.
Pitfall: Being vague about exposure and eligibility.
Saying “compare users who got notifications to users who didn’t” is not enough, because users who receive notifications are usually more active, more permissioned, and more likely to have relevant inventory. Define the randomized eligible population first, then distinguish assignment, delivery, impression, open, and click.
Pitfall: Ignoring interference and repeated treatment.
Notifications are not one-shot independent events. Users receive many notifications, sellers may respond differently when buyer demand shifts, and one user’s action can affect another user’s marketplace experience. Call out repeated-measures correlation, cannibalization, and cluster/spillover concerns explicitly.
Connections
Interviewers may pivot from here into ranking evaluation, especially how to judge whether “similar listings” are actually relevant, or into long-term experimentation, such as novelty effects and notification fatigue. They may also ask about SQL aggregation, cohort analysis, sequential testing, or marketplace experimentation where buyer and seller outcomes must be balanced.
Further reading
-
Trustworthy Online Controlled Experiments by Kohavi, Tang, and Xu — practical reference for experiment design, metrics, pitfalls, and launch decisions.
-
Improving the Sensitivity of Online Controlled Experiments by Deng et al. — original CUPED-style variance reduction ideas used widely in large-scale experimentation.
-
Design and Analysis of Cluster Randomization Trials by Murray — deeper treatment of intracluster correlation, design effects, and clustered experiment analysis.
Featured in interview prep guides
Practice questions
- Find multi-account buckets and unread rateMeta · Data Scientist · Technical Screen · medium
- How to evaluate similar-listing notifications featureMeta · Data Scientist · Technical Screen · easy
- Evaluate new-product notification featureMeta · Data Scientist · Technical Screen · medium
- Design a clustered notification experiment with guardrailsMeta · Data Scientist · Technical Screen · Medium
- Design an A/B test for pinned-unread featureMeta · Data Scientist · Technical Screen · hard
- Evaluate Success of 'Similar Listings' Notification FeatureMeta · Data Scientist · Technical Screen · medium
- Leverage Data Sources for Effective Push Notification StrategyMeta · Data Scientist · Technical Screen · medium