What does the Meta Data Scientist interview process look like?

Based on candidate reports compiled in this guide, the Meta Data Scientist loop typically includes 2 stages: Technical Screen, Onsite. Each stage covers a distinct set of topics walked through in detail above.

What topics does Meta focus on in Data Scientist interviews?

Meta Data Scientist interviews cover Data Manipulation (SQL/Python), Analytics & Experimentation, Statistics & Math, Machine Learning, Behavioral & Leadership. The guide above breaks each topic down into core concepts, worked examples, and the real questions candidates were asked.

How many real Meta Data Scientist interview questions are in this guide?

This guide is anchored to 33 real Meta Data Scientist interview questions sourced from candidate reports, each linked to a full practice page with starter code, solution discussion, and community comments.

Meta Data Scientist Interview Prep Guide

Everything Meta actually asks Data Scientist candidates — concept walkthroughs, worked examples, and the real interview questions, drawn from candidate reports. Free to read.

Meta Data Scientist Interview Cheatsheet cover

Technical Screen

Data Manipulation (SQL/Python)

SQL Event Log Analytics — covered in depth under Onsite below.

Analytics & Experimentation

A/B Testing And Experiment Design — covered in depth under Onsite below.
Cluster Randomized Experiments And Network Interference — covered in depth under Onsite below.
Video Calling And Group Calls Product Analytics — covered in depth under Onsite below.

Notifications And Push Notification Analytics

Top-down metric tree infographic: north-star metric for notifications, branches to Primary metrics, Notification funnel drivers, and Guardrails & experiment design with real metric names and arrows.

What's being tested

Interviewers are probing whether you can evaluate a notification product as a causal inference and measurement design problem, not just as a dashboarding exercise. Strong answers define who is eligible, what counts as exposure, what success means, and how to separate incremental value from users who were already highly engaged. Meta cares because notifications can increase marketplace liquidity, message responses, and retention, but they can also create fatigue, opt-outs, spam reports, and long-term engagement decay. The interviewer is looking for disciplined tradeoffs: growth versus user experience, short-term clicks versus durable marketplace outcomes, and individual-level effects versus network or marketplace spillovers.

Core knowledge

Notification funnels should be decomposed into eligibility, send, delivery, impression/open, click, landing-page engagement, downstream action, and long-term retention. For marketplace, downstream metrics might include `listing_view`, `save`, `seller_message`, `offer_sent`, `purchase_intent`, or `transaction_proxy`, not just `notification_click_rate`.
Primary metrics should reflect the product’s intended causal mechanism. For similar-listing alerts, a better primary metric than `CTR` may be incremental `qualified_listing_views_per_user` or `buyer_seller_message_threads_per_eligible_user`, because clickbait notifications can raise clicks while lowering marketplace quality or trust.
Guardrail metrics are essential for push notifications because the treatment imposes attention costs. Common guardrails include `push_opt_out_rate`, `notification_disable_rate`, `app_uninstall_rate`, `hide_report_rate`, `negative_feedback_rate`, `session_depth`, `7d_retention`, and total notification volume per user.
Randomization unit is usually the user for independent notification eligibility, but cluster randomization may be needed when users interact, share devices, belong to households, or participate in marketplaces with supply-demand interference. Randomizing notifications at the event level risks cross-contamination and confusing user experience.
Eligibility definition must be fixed before analysis: for example, users who viewed or saved a marketplace item in the past 7 days, have push permissions enabled, and have at least one similar listing available. Analyze both `intent-to-treat` over eligible randomized users and treatment-on-treated for exposed users, with causal caveats.
Power and MDE should be discussed at the user level, not notification level, because multiple sends to the same person are correlated. A basic minimum detectable effect is approximately
$MDE \approx (z_{1-\alpha/2}+z_{power})\sqrt{\frac{2\sigma^2}{n}}$
for equal-sized arms; clustered designs inflate variance.
Clustered experiments require the design effect:
$DE = 1 + (m - 1)\rho$
where $m$ is average cluster size and $\rho$ is intracluster correlation. If households, social clusters, or geographic markets have $\rho=0.02$ and $m=50$ , effective sample size drops by roughly half.
CUPED can reduce variance using pre-experiment covariates, especially historical marketplace engagement or prior notification responsiveness. The adjusted metric is $Y' = Y - \theta(X-\bar X)$ , where $\theta = \frac{Cov(Y,X)}{Var(X)}$ ; it helps most when pre-period and post-period behavior are highly correlated.
Multiple testing matters when slicing by country, platform, buyer/seller role, notification type, or engagement cohort. Pre-register a primary metric and use corrections like Holm-Bonferroni or control false discovery rate with Benjamini-Hochberg for exploratory subgroup analysis.
Heterogeneous treatment effects are often central for notifications. New users may need helpful prompts, while power users may experience fatigue. Segment by notification permission status, historical open rate, marketplace intent, inventory density, platform, and prior mute/negative feedback behavior.
Interference and cannibalization are common. Similar-listing notifications may shift views from organic feed, search, saved items, or other notifications rather than create new demand. Measure incremental total marketplace engagement, not only engagement attributable to the new notification surface.
Unread-rate analysis should be user-centered. For multi-account or multi-device users, compute metrics such as `unread_notifications_per_user`, `users_with_unread_rate_gt_50pct`, or bucketed account counts carefully; otherwise heavy users dominate averages and mask whether the feature worsens notification overload.

Worked example

For “How to evaluate similar-listing notifications feature,” start by clarifying the product goal: are we trying to increase buyer discovery, accelerate marketplace transactions, or re-engage users who showed shopping intent? Then define the eligible population: users who viewed or saved an item, have notification permissions, and can be matched to available similar listings within a time window. A strong answer would organize around four pillars: metric hierarchy, experiment design, segmentation, and risk monitoring. The primary metric could be `incremental_qualified_listing_views_per_eligible_user` or `buyer_seller_message_threads_per_user`, with secondary metrics like `notification_open_rate`, `save_rate`, and `return_sessions`. Guardrails should include `push_opt_out_rate`, `notification_settings_disable_rate`, `hide_report_rate`, total notifications received, and `7d_retention`.

The experiment would likely randomize at the user level: treatment users can receive similar-listing pushes, while control users continue with existing notification policy. You would analyze `ITT` first to preserve randomization, then separately inspect exposed users to understand mechanism. A key tradeoff is choosing `CTR` versus downstream marketplace actions: `CTR` is sensitive and fast, but it can reward low-quality or overly frequent notifications, so it should not be the sole success metric. You would also check cannibalization by comparing total marketplace views and messages, not just clicks from this notification. Close by saying that with more time, you would estimate heterogeneous effects by marketplace intent, notification sensitivity, and inventory density, then use those insights for targeted rollout rather than a blanket launch.

A second angle

For “Design a clustered notification experiment with guardrails,” the same evaluation logic applies, but independence assumptions become the main constraint. Instead of randomizing individual users, you may randomize clusters such as households, social graph components, geographic markets, or seller-buyer communities to reduce spillovers. The analysis must account for intracluster correlation using cluster-robust standard errors, cluster-level aggregation, or hierarchical models. The power calculation should use the design effect, because 1 million users in large correlated clusters may behave like far fewer independent observations. Guardrails become especially important because cluster-level treatment may change marketplace liquidity, seller response times, or buyer competition in ways that affect untreated users.

Common pitfalls

Pitfall: Optimizing for `notification_click_rate` alone.

This is the classic analytical mistake. A notification can produce high `CTR` by being urgent, vague, or frequent while increasing opt-outs and reducing trust. A stronger answer ties success to incremental downstream value and includes fatigue guardrails.

Pitfall: Being vague about exposure and eligibility.

Saying “compare users who got notifications to users who didn’t” is not enough, because users who receive notifications are usually more active, more permissioned, and more likely to have relevant inventory. Define the randomized eligible population first, then distinguish assignment, delivery, impression, open, and click.

Pitfall: Ignoring interference and repeated treatment.

Notifications are not one-shot independent events. Users receive many notifications, sellers may respond differently when buyer demand shifts, and one user’s action can affect another user’s marketplace experience. Call out repeated-measures correlation, cannibalization, and cluster/spillover concerns explicitly.

Connections

Interviewers may pivot from here into ranking evaluation, especially how to judge whether “similar listings” are actually relevant, or into long-term experimentation, such as novelty effects and notification fatigue. They may also ask about SQL aggregation, cohort analysis, sequential testing, or marketplace experimentation where buyer and seller outcomes must be balanced.

Analyze Key Metrics for Notification System Success

Evaluates metric design, causal reasoning, experiment setup, diagnostics, SQL/statistical checks, and recommendations in a realistic interview setting...

Meta Data Scientist Interview Prep Guide

Technical Screen

Data Manipulation (SQL/Python)

Analytics & Experimentation

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Analyze Key Metrics for Notification System Success

Track Success and Guardrail Metrics for Push Notifications

Determine Metrics to Evaluate Notification Impact on Users

Statistics & Math

Machine Learning

Behavioral & Leadership

Onsite

Data Manipulation (SQL/Python)

What's being tested

Patterns & templates

Common pitfalls

Practice these

Identify Top Three Active Users by Event Date

Analyze Group Call Adoption Using SQL Queries

Compute video-call SQL metrics with edge cases

Analytics & Experimentation

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Design video-ads experiment and handle null results

Design and analyze an A/B test

Design Experiment to Evaluate New Video-Ad Effectiveness

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Design cluster-randomized test under network effects

Implement Clustered Sampling to Mitigate Network Effects in Testing

Design a clustered A/B test with spillovers

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Should We Launch Group Calling?

Design analytics and experiment for group video calls

Identify User Interest in Group Video Calls Using Data

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Estimate ads ranking revenue impact

Diagnosing a drop in total ads revenue

Evaluate Impact of Targeting Ads to High-Intent Users

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Diagnose drop and assess metric change impact

Determine Key Metrics for Circle's Success Evaluation

Design metrics and experiment for stolen-post detection

Statistics & Math

What's being tested

Core knowledge