PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Tools For Humanity

Detect and evaluate "stolen" posts

Last updated: May 2, 2026

Quick Overview

This question evaluates data-science competencies in instrumentation and feature design, content-similarity detection and labeling, causal inference for experiments, measurement of harms, and model evaluation, and belongs to the Analytics & Experimentation domain.

  • easy
  • Tools For Humanity
  • Analytics & Experimentation
  • Data Scientist

Detect and evaluate "stolen" posts

Company: Tools For Humanity

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: easy

Interview Round: Technical Screen

You are a Data Scientist at a Twitter-like app. The platform suspects an increase in “stolen posts” (users reposting/plagiarizing others’ content without attribution), and a new algorithm has been built to reduce stolen posts. Answer the following product/DS questions. ## 1) What additional information do you need? Given only the basic post table schema (post id/author/time/type/content/parent), list the **additional data** you would request to reliably determine whether a post is “stolen.” ## 2) What are drawbacks of your methodology? Assume you propose a detection methodology (rules, ML, similarity search, etc.). Explain key limitations and failure modes. ## 3) What harms can stolen posts cause? Describe potential user, creator, and platform harms. Include at least one harm that affects **metrics** or **model feedback loops**. ## 4) How do you evaluate the new algorithm’s effectiveness? Design an evaluation plan (online experiment or quasi-experiment) to measure whether the algorithm reduces stolen posts **without** hurting the product. Your plan should include: - A clear **primary metric** plus **diagnostic** and **guardrail** metrics - How you will handle **confounding** (e.g., seasonality, creator mix shifts, enforcement effects) - How to validate that the metric truly reflects “stolen post” reduction (label quality / delayed labels) - What decision rule you would use to launch/iterate

Quick Answer: This question evaluates data-science competencies in instrumentation and feature design, content-similarity detection and labeling, causal inference for experiments, measurement of harms, and model evaluation, and belongs to the Analytics & Experimentation domain.

Tools For Humanity logo
Tools For Humanity
Aug 29, 2025, 12:00 AM
Data Scientist
Technical Screen
Analytics & Experimentation
4
0

You are a Data Scientist at a Twitter-like app. The platform suspects an increase in “stolen posts” (users reposting/plagiarizing others’ content without attribution), and a new algorithm has been built to reduce stolen posts.

Answer the following product/DS questions.

1) What additional information do you need?

Given only the basic post table schema (post id/author/time/type/content/parent), list the additional data you would request to reliably determine whether a post is “stolen.”

2) What are drawbacks of your methodology?

Assume you propose a detection methodology (rules, ML, similarity search, etc.). Explain key limitations and failure modes.

3) What harms can stolen posts cause?

Describe potential user, creator, and platform harms. Include at least one harm that affects metrics or model feedback loops.

4) How do you evaluate the new algorithm’s effectiveness?

Design an evaluation plan (online experiment or quasi-experiment) to measure whether the algorithm reduces stolen posts without hurting the product.

Your plan should include:

  • A clear primary metric plus diagnostic and guardrail metrics
  • How you will handle confounding (e.g., seasonality, creator mix shifts, enforcement effects)
  • How to validate that the metric truly reflects “stolen post” reduction (label quality / delayed labels)
  • What decision rule you would use to launch/iterate

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Tools For Humanity•More Data Scientist•Tools For Humanity Data Scientist•Tools For Humanity Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.