Evaluating a LinkedIn Jobs Recommender Upgrade
LinkedIn is upgrading the algorithm that recommends jobs to members across surfaces such as the Jobs tab, homepage modules, and notifications. This is a two-sided marketplace: member outcomes must improve without harming employer outcomes.
Assume logs include impressions, positions, clicks, saves, apply starts and completions, latency, response events, and eligibility sets per request.
Constraints & Assumptions
-
Include offline and online metrics.
-
Account for marketplace effects between members and employers.
-
Define exposure and eligibility carefully.
-
Include power, diagnostics, and launch decision criteria.
Clarifying Questions to Ask
-
Which surface is the primary launch surface?
-
What is the primary objective: applies, qualified applies, member satisfaction, employer outcomes, or marketplace efficiency?
-
Are jobs inventory-constrained or impacted by treatment allocation?
-
Can we log randomized candidate sets or propensities for counterfactual evaluation?
Part 1 - Metrics
Which offline and online metrics would you track?
What This Part Should Cover
-
Offline: NDCG@K, MAP, recall@K, calibration, coverage, diversity, cold-start slices, and counterfactual estimates if available.
-
Online member metrics: CTR, saves, apply starts, completed applies, qualified applies, long-term job-seeker retention, and satisfaction.
-
Employer metrics: application quality, distribution, response rate, fill rate, and concentration.
-
System guardrails: latency, errors, notification fatigue, unsubscribes, and fairness.
Part 2 - A/B Test Design
How would you compare the new model against the current model while mitigating network effects?
What This Part Should Cover
-
Define user-level randomization for member-facing surfaces where appropriate.
-
Consider cluster, geo, job-level, or switchback designs if marketplace interference is material.
-
Keep eligibility, candidate generation, logging, and surfaces consistent across arms.
-
Include triggered analysis and intent-to-treat analysis.
Part 3 - Powering and Diagnostics
How would you determine sample size, exposure, duration, and diagnose results?
What This Part Should Cover
-
Estimate baseline rates, MDE, variance, power, alpha, and maturation windows.
-
Check SRM, logging, novelty, seasonality, position bias, and segment heterogeneity.
-
Analyze by member segment, job type, market, surface, and supply-demand balance.
-
Define ship, iterate, or rollback criteria.
Follow-up Questions
-
What if member apply rate rises but employer response rate falls?
-
How would you evaluate a recommender change offline before risking traffic?
-
How would you handle cold-start jobs?