What features and feature selection would you use?
Company: Meta
Role: Data Scientist
Category: Machine Learning
Difficulty: Medium
Interview Round: Technical Screen
## Context
You are building an ML system to rank/promote **shop ads** in an e-commerce feed/search page. At serving time, the system may score candidate shop ads for a given user and context.
Assume you have access to:
- User events (impressions, clicks, purchases, shop follows)
- Shop metadata (category, price bands, inventory signals)
- Query/context (search query, time, device)
- Ad/auction signals (bid, budget pacing)
## Questions
1. If you were to build the **shop-ads ranking model**, what feature families would you use? (Give examples.)
2. You have “a ton” of candidate features. How would you identify which ones are **useful**?
- Include at least one **offline** approach and one **online**/production-safe approach.
3. If you were **not allowed to use a model-based importance method** (e.g., no SHAP/GBDT gain/permutation importance), how would you still find the key useful features?
4. Call out common pitfalls: leakage, feedback loops, cold start, and feature drift.
Quick Answer: This question evaluates feature engineering, feature selection, and production-aware machine learning system design skills for ranking shop ads, within the Machine Learning domain for a Data Scientist role.