PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Meta

Design Messenger spam experiment with clustering

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's skills in experimental design, causal inference, clustering and interference analysis, metric specification, and power/sample-size calculation within the Analytics & Experimentation domain for Data Scientist roles.

  • hard
  • Meta
  • Analytics & Experimentation
  • Data Scientist

Design Messenger spam experiment with clustering

Company: Meta

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

Meta Messenger is considering launching a new spam-detection algorithm that routes suspected spam into a separate folder and slightly delays delivery for additional checks. You must design the experiment, decide whether to launch, and specifically assess whether cluster randomization is appropriate. Answer the following: 1) Primary decision and metrics: Define the primary success metric as "spam reply rate" (probability a recipient replies within 24 hours to a message flagged as spam). Propose at least three guardrail metrics (e.g., delivery latency, false-positive rate on non-spam, user-initiated spam reports) and two secondary metrics (e.g., block rate, conversation retention). Specify precise denominators and attribution windows. 2) Unit of randomization: Compare individual-user randomization vs cluster randomization at (a) conversation/thread, (b) recipient-user (ego), and (c) geo (country or data-center switchback). For each, identify interference pathways in messaging networks (e.g., sender in control → recipient in treatment; multi-party threads; new threads formed mid-test) and when SUTVA is most likely violated. State which unit you choose and why. 3) Cluster randomization pitfalls: Explain problems unique to cluster designs: inflated variance from ICC, unequal/variable cluster sizes, cluster drift (members join/leave threads), and treatment leakage (new threads not bound to cluster). Give concrete mitigation tactics: cluster locking via stable hashing, intent-to-treat with cluster-level assignment logs, cluster-robust (HC2/HC3) or randomization-inference SEs, and weighting choices (cluster-weighted vs message-weighted) with rationale. 4) Power and sample size: Suppose baseline spam reply rate is 2.0%, target to detect a 10% relative reduction (to 1.8%), alpha=0.05, power=0.8, 7-day test. You expect 200M suspected-spam messages/day globally. If you cluster by thread with average m=3 suspected-spam messages per thread over 7 days and ICC=0.07 for the 24h-reply outcome: (i) compute the design effect DEFF=1+(m−1)*ICC, (ii) compute the effective sample size versus individual randomization, and (iii) explain how this changes the MDE. If instead clustering by recipient with m=20 messages/recipient and ICC=0.02, repeat the calculation and recommend a design. 5) Analysis plan: Detail the estimator and inference: difference-in-means at the cluster level vs message level with CR2/CR3 SEs or mixed-effects logistic regression (random intercept for cluster), CUPED using a 14-day pre-period, and a pre-registered tie-break for ambiguous threads (e.g., cluster by min(user_id) hash). Specify how you’ll handle multiple exposure types (flag only vs flag+delay) and noncompliance. 6) Launch decision: Given plausible effect sizes (e.g., −8% to −12% relative on spam reply rate) and guardrails not regressing, state the quantitative launch criterion and the minimal ramp plan (e.g., 5%→25%→100%) with geo holdout and a 1% long-term holdback for ongoing monitoring.

Quick Answer: This question evaluates a data scientist's skills in experimental design, causal inference, clustering and interference analysis, metric specification, and power/sample-size calculation within the Analytics & Experimentation domain for Data Scientist roles.

Related Interview Questions

  • Measure scheduled posts feature success - Meta (medium)
  • Estimate ads ranking revenue impact - Meta (medium)
  • How should you evaluate unconnected content? - Meta (medium)
  • Should WhatsApp launch group calls? - Meta (medium)
  • How would you grow Meta products? - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Analytics & Experimentation
1
0

Experiment Design: Spam-Detection Algorithm for Messenger

You are evaluating a new spam-detection algorithm that routes suspected spam into a separate folder and slightly delays delivery for additional checks. Design the experiment, decide whether to launch, and explicitly assess whether cluster randomization is appropriate.

Answer the following:

  1. Primary decision and metrics
    • Define the primary success metric as "spam reply rate" = probability a recipient replies within 24 hours to a message flagged as spam.
    • Propose at least three guardrail metrics (e.g., delivery latency, false-positive rate on non-spam, user-initiated spam reports) and two secondary metrics (e.g., block rate, conversation retention).
    • For each metric, specify precise denominators and attribution windows.
  2. Unit of randomization
    • Compare individual delivery-level randomization vs cluster randomization at: a) conversation/thread, b) recipient-user (ego), and c) geo (country or data-center switchback).
    • For each, identify interference pathways in messaging networks (e.g., sender in control → recipient in treatment; multi-party threads; new threads formed mid-test) and when SUTVA is most likely violated.
    • State which unit you choose and why.
  3. Cluster randomization pitfalls
    • Explain problems unique to cluster designs: inflated variance from ICC, unequal/variable cluster sizes, cluster drift (members join/leave threads), and treatment leakage (new threads not bound to cluster).
    • Give concrete mitigation tactics: cluster locking via stable hashing, intent-to-treat with cluster-level assignment logs, cluster-robust (CR2/CR3) or randomization-inference SEs, and weighting choices (cluster-weighted vs message-weighted) with rationale.
  4. Power and sample size
    • Suppose baseline spam reply rate is 2.0%, target to detect a 10% relative reduction (to 1.8%), α = 0.05, power = 0.8, 7-day test. You expect 200M suspected-spam messages/day globally.
    • If clustering by thread with average m = 3 suspected-spam messages per thread over 7 days and ICC = 0.07 for the 24h-reply outcome: (i) compute the design effect DEFF = 1 + (m − 1)·ICC, (ii) compute the effective sample size versus individual randomization, and (iii) explain how this changes the MDE.
    • If instead clustering by recipient with m = 20 messages/recipient and ICC = 0.02, repeat the calculation and recommend a design.
  5. Analysis plan
    • Detail the estimator and inference: difference-in-means at the cluster level vs message level with CR2/CR3 SEs or mixed-effects logistic regression (random intercept for cluster), CUPED using a 14-day pre-period, and a pre-registered tie-break for ambiguous threads (e.g., cluster by min(user_id) hash).
    • Specify how you’ll handle multiple exposure types (flag only vs flag+delay) and noncompliance.
  6. Launch decision
    • Given plausible effect sizes (e.g., −8% to −12% relative on spam reply rate) and guardrails not regressing, state the quantitative launch criterion and the minimal ramp plan (e.g., 5% → 25% → 100%) with geo holdout and a 1% long-term holdback for ongoing monitoring.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Meta•More Data Scientist•Meta Data Scientist•Meta Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.