Design Messenger spam experiment with clustering

Q: Design Messenger spam experiment with clustering

This question evaluates a data scientist's skills in experimental design, causal inference, clustering and interference analysis, metric specification, and power/sample-size calculation within the Analytics & Experimentation domain for Data Scientist roles.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Experiment Design: Spam-Detection Algorithm for Messenger

You are evaluating a new spam-detection algorithm that routes suspected spam into a separate folder and slightly delays delivery for additional checks. Design the experiment, decide whether to launch, and explicitly assess whether cluster randomization is appropriate.

Answer the following:

Primary decision and metrics
- Define the primary success metric as "spam reply rate" = probability a recipient replies within 24 hours to a message flagged as spam.
- Propose at least three guardrail metrics (e.g., delivery latency, false-positive rate on non-spam, user-initiated spam reports) and two secondary metrics (e.g., block rate, conversation retention).
- For each metric, specify precise denominators and attribution windows.
Unit of randomization
- Compare individual delivery-level randomization vs cluster randomization at: a) conversation/thread, b) recipient-user (ego), and c) geo (country or data-center switchback).
- For each, identify interference pathways in messaging networks (e.g., sender in control → recipient in treatment; multi-party threads; new threads formed mid-test) and when SUTVA is most likely violated.
- State which unit you choose and why.
Cluster randomization pitfalls
- Explain problems unique to cluster designs: inflated variance from ICC, unequal/variable cluster sizes, cluster drift (members join/leave threads), and treatment leakage (new threads not bound to cluster).
- Give concrete mitigation tactics: cluster locking via stable hashing, intent-to-treat with cluster-level assignment logs, cluster-robust (CR2/CR3) or randomization-inference SEs, and weighting choices (cluster-weighted vs message-weighted) with rationale.
Power and sample size
- Suppose baseline spam reply rate is 2.0%, target to detect a 10% relative reduction (to 1.8%), α = 0.05, power = 0.8, 7-day test. You expect 200M suspected-spam messages/day globally.
- If clustering by thread with average m = 3 suspected-spam messages per thread over 7 days and ICC = 0.07 for the 24h-reply outcome: (i) compute the design effect DEFF = 1 + (m − 1)·ICC, (ii) compute the effective sample size versus individual randomization, and (iii) explain how this changes the MDE.
- If instead clustering by recipient with m = 20 messages/recipient and ICC = 0.02, repeat the calculation and recommend a design.
Analysis plan
- Detail the estimator and inference: difference-in-means at the cluster level vs message level with CR2/CR3 SEs or mixed-effects logistic regression (random intercept for cluster), CUPED using a 14-day pre-period, and a pre-registered tie-break for ambiguous threads (e.g., cluster by min(user_id) hash).
- Specify how you’ll handle multiple exposure types (flag only vs flag+delay) and noncompliance.
Launch decision
- Given plausible effect sizes (e.g., −8% to −12% relative on spam reply rate) and guardrails not regressing, state the quantitative launch criterion and the minimal ramp plan (e.g., 5% → 25% → 100%) with geo holdout and a 1% long-term holdback for ongoing monitoring.

Design Messenger spam experiment with clustering

Experiment Design: Spam-Detection Algorithm for Messenger

Solution

Comments (0)

Design Messenger spam experiment with clustering

Overview

Experiment Design: Spam-Detection Algorithm for Messenger

Solution

Comments (0)