PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Disney

Build Naive Bayes spam classifier with F1

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of probabilistic text classification with Naive Bayes, text preprocessing and feature extraction, and use of the F1 score for performance measurement in a binary spam detection task.

  • medium
  • Disney
  • Machine Learning
  • Machine Learning Engineer

Build Naive Bayes spam classifier with F1

Company: Disney

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

You are given a text classification dataset for **spam detection** (binary labels: `spam` vs `not_spam`) in a Jupyter notebook environment. ### Task 1. Preprocess the text (basic cleaning/tokenization is sufficient). 2. Convert text to features suitable for Naive Bayes (e.g., bag-of-words or TF-IDF). 3. Train a **Naive Bayes** classifier. 4. Evaluate the model using **F1 score** (clearly state whether it is the F1 for the positive class or a specific averaging scheme). 5. Run the trained model on a few test examples and show predicted labels (and optionally probabilities). ### Constraints / Notes - The dataset may be class-imbalanced. - You should avoid data leakage (fit text vectorizer only on training data). - You may choose reasonable train/validation splitting if only one labeled set is provided.

Quick Answer: This question evaluates understanding of probabilistic text classification with Naive Bayes, text preprocessing and feature extraction, and use of the F1 score for performance measurement in a binary spam detection task.

Disney logo
Disney
Nov 1, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
Machine Learning
3
0

You are given a text classification dataset for spam detection (binary labels: spam vs not_spam) in a Jupyter notebook environment.

Task

  1. Preprocess the text (basic cleaning/tokenization is sufficient).
  2. Convert text to features suitable for Naive Bayes (e.g., bag-of-words or TF-IDF).
  3. Train a Naive Bayes classifier.
  4. Evaluate the model using F1 score (clearly state whether it is the F1 for the positive class or a specific averaging scheme).
  5. Run the trained model on a few test examples and show predicted labels (and optionally probabilities).

Constraints / Notes

  • The dataset may be class-imbalanced.
  • You should avoid data leakage (fit text vectorizer only on training data).
  • You may choose reasonable train/validation splitting if only one labeled set is provided.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Disney•More Machine Learning Engineer•Disney Machine Learning Engineer•Disney Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.