Design a robust conversion propensity model
Company: Netflix
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
You are the modeling DS for notifications at a marketplace. Goal: score users daily with a propensity to purchase within 7 days if sent a promo notification today; only top 20% by score will be contacted. Design the model end-to-end: (1) Labeling and leakage: define a correct label that avoids post-treatment leakage when historical notifications already influenced behavior; handle users with multiple exposures; define negative windows; decide whether to use intent-to-treat or treated-only labels and justify. (2) Features: propose time-windowed behavioral features, catalog/category signals, price sensitivity, recency/frequency, user–item interactions; specify how to avoid target leakage, enforce time-consistent joins, and mitigate training–serving skew. (3) Class imbalance and calibration: choose loss, regularization, and calibration method; explain how you will monitor and recalibrate over time. (4) Offline evaluation: pick metrics (e.g., PR-AUC for ranking, calibration error), construct time-based splits, and design slice analyses for country and tenure. (5) Causal lift and policy value: with historical logs lacking randomization, propose an approach (e.g., inverse propensity weighting or doubly robust estimation) to estimate incremental revenue of the top-20% policy; describe how you will get propensities and reduce bias (overlap checks, trimming). (6) Online validation and ramp: define guardrails and primary metrics, traffic split, holdout policy, ramp criteria, and a plan to detect feedback loops and non-stationarity. (7) Cold start: describe how you will score new users/items on day 0 and backfill training labels over time.
Quick Answer: This question evaluates a candidate's competency in end-to-end propensity modeling, covering correct labeling to avoid post-treatment leakage, time-windowed feature engineering, class-imbalance and calibration strategies, causal uplift estimation, and production concerns such as monitoring, serving consistency, and cold-start scoring.