Design experiment for fake accounts impact

Q: Design experiment for fake accounts impact

This question evaluates a data scientist's competency in experimental design and causal inference for networked social platforms, including unit and cluster randomization choices, metric selection and exposure windows, power estimation, interference diagnostics, treatment misclassification handling, ethical ramping, and quasi-experimental backups.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Experiment Design: Removing Detected Fake Accounts and Measuring Causal Impact

Context: You are designing an end-to-end experiment on a large, interaction-heavy social platform to remove detected fake accounts (or hide them from some users) and estimate causal impact on real users' experience. Because users are connected, network interference is a first-order concern.

Be specific and address the following:

Experiment unit and randomization
- Choose and justify between user-level, ego-network cluster, or geography-level randomization.
- If you choose clusters, describe how you would construct them to minimize cross-treatment contamination while maintaining power.
Primary and guardrail metrics
- Specify exact metric definitions and windows (exposure-based vs calendar-based). Examples include: comments_per_view, 7-day retention of real accounts, abuse reports per 1K views.
Power and duration
- Provide a concrete back-of-envelope sample-size calculation assuming:
  - Detectable effect: 0.5% relative change in comments_per_view
  - Baseline mean: 0.12
  - Overdispersion present
  - Intra-cluster correlation (ICC): 0.02
Interference diagnostics
- Propose two tests to quantify spillovers (e.g., ghost exposure analysis for users connected to treated removals; edge-cut A/A).
- Define the expected null for each test.
Noncompliance and misclassification
- Detection of fake accounts is imperfect. Outline an IV or CUPED/DID approach to recover LATE using removal intensity as an instrument, and list assumptions and falsification checks.
Ramp and ethics
- Define a staged rollout with kill-switch criteria using guardrails (e.g., creator reach drop >1% with p<0.05).
- Include how you will prevent label leakage in feeds and notifications.
Beyond experimentation
- If randomization is infeasible in some markets, provide a quasi-experimental backup (synthetic control or staggered DID) and specify exact covariates required from logs.
- Conclude with a product recommendation if engagement dips short-term but abuse reports drop materially.

Design experiment for fake accounts impact

Experiment Design: Removing Detected Fake Accounts and Measuring Causal Impact

Solution

Comments (0)

Design experiment for fake accounts impact

Overview

Experiment Design: Removing Detected Fake Accounts and Measuring Causal Impact

Solution

Comments (0)