Predict Monthly Job-Change Risk (Discrete-Time Survival Setup)
Context
You are building a monthly model to predict the probability that a LinkedIn member will change jobs in a given month. The output will be used across member notifications, job-ad targeting, and recruiter tools.
Tasks
-
Features
-
Specify features across:
-
Profile and job history (e.g., school, major, degree, normalized title/seniority, tenure in role/company, number of past moves/promotions, industry, company size, location/remote).
-
Member behavior (e.g., recent profile edits, job searches and applications, views of job/company pages, job alerts, connections or messages with recruiters/HR, saved jobs, resume uploads, InMail activity, network growth, days active).
-
Seasonality (e.g., month-of-year, graduation peaks, fiscal cycles, industry-specific hiring seasons).
-
External demand (e.g., postings volume by title/geo, applicant-per-opening ratio, trend in postings, time-to-fill, unemployment rate, layoffs/news signals).
-
Label, data structure, and sampling
-
Define a hazard-style label with user–month rows: 1 if the user changes jobs in that month (and had not changed before), otherwise 0; censor rows after first change or inactivity.
-
Guard against leakage (e.g., ensure features only use data before the prediction month start; handle backfilled profile updates; exclude post-change behavior).
-
Address class imbalance and dataset size (e.g., downsample negatives with re-weighting or use focal loss; manage right-censoring).
-
Evaluation and fairness
-
Choose metrics: PR-AUC as primary, plus recall@fixed precision, calibration (Brier score, reliability curves), and time-split backtests (rolling-origin).
-
Include fairness checks across key groups (e.g., region, industry, seniority) for performance and calibration parity.
-
Product use and experimentation
-
Describe how predictions are used in:
-
Member notifications.
-
Job-ad targeting.
-
Recruiter tools (candidate discovery/prioritization).
-
Propose an A/B testing plan to measure business impact and avoid feedback loops (e.g., shadow scoring, persistent holdouts, treatment indicators in training, cluster randomization, IPS/doubly-robust evaluation).