PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Other

Detect clickbait without labels, then supervise

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in weak supervision, feature engineering from behavioral signals, model evaluation, and robustness to adversarial manipulation within the Machine Learning domain.

  • hard
  • Other
  • Machine Learning
  • Data Scientist

Detect clickbait without labels, then supervise

Company: Other

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You lack labeled data for clickbait ads. 1) Engineer features that capture short‑term CTR and its decay among users who previously clicked (e.g., 14‑day within‑viewer CTR delta, dwell time on landing, bounce). 2) Cluster creatives in the CTR×Decay space and specify how you choose the 'likely clickbait' cluster without over‑blocking high‑quality novelty. 3) Convert these weak labels into a supervised model; describe features (text embeddings, historical user‑ad interaction, publisher quality), training regimen, evaluation metrics (precision@k, uplift on long‑term engagement), and guardrails. 4) Outline how to prevent gaming (e.g., rotating creatives) and monitor for regression to the mean.

Quick Answer: This question evaluates a data scientist's competency in weak supervision, feature engineering from behavioral signals, model evaluation, and robustness to adversarial manipulation within the Machine Learning domain.

Related Interview Questions

  • Derive and regularize logistic regression - Other (hard)
  • Design anomaly detection and handle imbalanced logistic regression - Other (Medium)
  • Extract companies from noisy text - Other (hard)
  • Evaluate and select K in K-means - Other (medium)
  • Explain SVM kernels and complexity - Other (hard)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
2
0

Detecting Clickbait Ads Without Labeled Data

Context

You are asked to detect clickbait ad creatives when there is no labeled training data. You have impression/click logs, post-click signals (e.g., dwell time, bounce), and metadata (creative text/image, destination/publisher). The goal is to bootstrap weak labels from behavior, then train a supervised model that avoids over-blocking legitimate, high-quality novelty and is robust to adversarial gaming.

Tasks

  1. Feature engineering: Propose features that capture short-term CTR and its decay among users who previously clicked a creative (e.g., 14-day within-viewer CTR delta), as well as post-click signals (dwell time on landing page, bounce).
  2. Weak labeling via clustering: Cluster creatives in the CTR × decay space. Specify how you select the "likely clickbait" cluster while minimizing over-blocking of high-quality novelty.
  3. Supervised model: Convert weak labels into a supervised model. Describe model features (e.g., text embeddings, historical user–ad interaction, publisher quality), training regimen, evaluation metrics (e.g., precision@k, uplift on long-term engagement), and guardrails.
  4. Abuse prevention and monitoring: Outline how to prevent gaming (e.g., rotating creatives) and how to monitor for regression to the mean over time.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Other•More Data Scientist•Other Data Scientist•Other Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.