Detail NLP preprocessing and n‑gram choices
Company: Thumbtack
Role: Data Scientist
Category: Machine Learning
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates a data scientist's competency in NLP preprocessing and feature engineering, covering modality-specific text normalization, tokenization and subword choices, n-gram selection and sparsity trade-offs, handling of OOV terms/emojis/URLs/code, and empirical validation and model comparison.