Detecting Clickbait Ads Without Labeled Data
Context
You are asked to detect clickbait ad creatives when there is no labeled training data. You have impression/click logs, post-click signals (e.g., dwell time, bounce), and metadata (creative text/image, destination/publisher). The goal is to bootstrap weak labels from behavior, then train a supervised model that avoids over-blocking legitimate, high-quality novelty and is robust to adversarial gaming.
Tasks
-
Feature engineering: Propose features that capture short-term CTR and its decay among users who previously clicked a creative (e.g., 14-day within-viewer CTR delta), as well as post-click signals (dwell time on landing page, bounce).
-
Weak labeling via clustering: Cluster creatives in the CTR × decay space. Specify how you select the "likely clickbait" cluster while minimizing over-blocking of high-quality novelty.
-
Supervised model: Convert weak labels into a supervised model. Describe model features (e.g., text embeddings, historical user–ad interaction, publisher quality), training regimen, evaluation metrics (e.g., precision@k, uplift on long-term engagement), and guardrails.
-
Abuse prevention and monitoring: Outline how to prevent gaming (e.g., rotating creatives) and how to monitor for regression to the mean over time.