Design metrics and experiment for stolen-post detection
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
You work on **Stolen Post Detection** for a social platform (detecting content that is copied/reposted without permission).
A new detection algorithm is proposed (e.g., a model producing a stolen-probability score used to downrank, label, or block posts).
## Questions
1) **Problem framing & diagnostics**
- What are the key failure modes and risks (false positives vs false negatives) for stolen-post detection?
- If stakeholders report “stolen posts are down,” what would you check to validate whether this is real vs an artifact (measurement issues, reporting changes, seasonality, policy changes, spam shifts, etc.)?
2) **Metrics**
Propose:
- **Primary success metric(s)** (what you ultimately want to improve)
- **Diagnostic metrics** (to understand why things moved)
- **Guardrail metrics** (to prevent harm)
Include at least one metric that handles delayed / noisy ground truth (since “stolen” labels may come from user reports, manual review, or appeals).
3) **Experiment design**
Design an online experiment (A/B test or alternative) to evaluate the new algorithm. Address:
- Randomization unit (post-level vs author-level vs viewer-level) and why
- Interference / network effects (e.g., copied content affects multiple creators)
- Exposure definition (who is affected by the change)
- Sample size / power considerations at a high level (what drives variance)
- Ramp plan and decision criteria
4) **Tradeoffs and decision**
If offline metrics improve (e.g., higher precision/recall on labeled data) but online engagement drops, how would you decide what to launch and what follow-ups you’d run?
Quick Answer: Evaluates skills in metrics design, diagnostic analysis, and online experiment methodology within Analytics & Experimentation for a Data Scientist position, with a product-level focus on policy-sensitive detection and measurement challenges.