Build predictive model for feature rollout targeting
Company: Meta
Role: Data Scientist
Category: Machine Learning
Difficulty: Medium
Interview Round: Technical Screen
Before global launch, you want to predict which users or products would benefit most from the 'More like this' button so you can stage rollout.
Design an end-to-end modeling approach using only pre-launch data from the interactions/products schema and any new logging you can add at launch:
1) Labeling: propose a proxy label available pre-launch (e.g., user propensity to explore similar items via existing flows) and a post-launch true label (uplift in exploration rate or interaction_count per user). Explain how you will avoid target leakage, especially from features derived too close to the labeling window.
2) Features: list concrete user, product, and interaction features you’ll engineer (e.g., recency/frequency of category interactions, dwell-time proxies via interaction_count sequences, country, seller diversity). Include cross-features (user×category affinity) and cold-start strategies.
3) Models: choose two candidate model families (e.g., calibrated gradient-boosted trees and a sparse logistic regression). Explain when a deep model would be justified (sample size, feature types) and how you’d ensure calibration for decisioning.
4) Evaluation: define offline splits (time-based, user-level holdout), metrics (AUC/PR for ranking; Qini/uplift AUC if modeling treatment uplift), and how you’ll run prospective shadow testing once telemetry exists.
5) Decisioning: specify a thresholding or budgeted targeting policy and show how you’d simulate business impact under deployment constraints. Address fairness across countries and guard against negative spillovers (e.g., reduced variety). Include a plan for periodic retraining and drift monitoring.
Quick Answer: This question evaluates a data scientist's competency in end-to-end predictive modeling for targeted feature rollout, including labeling strategy, feature engineering, model selection, evaluation, decisioning, and monitoring within a Machine Learning domain.