Recommender Systems, Feed Ranking, And Marketplace Metrics
Asked of: Data Scientist
Last updated

What's being tested
LinkedIn is probing whether a Data Scientist can reason about recommendation quality, feed ranking, and marketplace outcomes beyond “optimize clicks.” Strong answers connect user behavior, model labels, online experiments, offline evaluation, and long-term ecosystem health. The interviewer is looking for judgment: which metrics matter, how to diagnose metric movement, how to separate ranking effects from instrumentation or traffic-mix effects, and how to evaluate tradeoffs across members, creators, recruiters, and job seekers. For LinkedIn specifically, recommendations shape core surfaces like the homepage feed, jobs, notifications, and creator distribution, so small ranking changes can affect engagement, trust, retention, and marketplace liquidity.
Core knowledge
-
Multi-stage recommender systems usually separate candidate generation, ranking, and re-ranking. A DS should know the metric purpose of each stage: candidate generation targets recall, ranking targets relevance or utility, and re-ranking enforces diversity, freshness, policy constraints, or marketplace balance.
-
Label choice is the first major modeling decision. Optimizing
CTRfavors curiosity and clickbait; optimizingdwell_timemay favor passive consumption; optimizingapply_rate,connection_accept_rate, orlong_click_ratebetter captures downstream value. A common utility formulation is:
-
Offline metrics should match the stage being evaluated. Use
Recall@KorHitRate@Kfor retrieval,NDCG@KorMAP@Kfor ranked relevance, and calibration metrics such asBrier scoreor reliability curves when predicted probabilities are used directly in ranking or bidding-like tradeoffs. -
Online metrics need a hierarchy: a primary success metric, guardrails, and diagnostics. For feed, primary metrics might include
sessions_per_member,feed_engaged_sessions, orquality_weighted_actions; guardrails includehide_rate,unfollow_rate,report_rate,negative_feedback_rate, creator distribution, and member retention. -
Marketplace recommenders require two-sided metrics. For
Jobs You May Be Interested In, candidate-side metrics includejob_click_rate,save_rate,apply_start_rate,apply_completion_rate, and job-seeker retention. Employer-side metrics include qualified applicants per job, recruiter response rate, fill probability, and applicant quality. -
Experiment design must account for interference. Feed and creator recommendations can violate the stable unit treatment value assumption because changing exposure for one member changes impressions available to others. For heavy supply-side interactions, consider creator-level, job-level, geo-level, or cluster-randomized designs, or interpret member-level A/B tests as partial-equilibrium effects.
-
Causal diagnosis starts by decomposing a metric. If
homepage_sessionsdrops, break it into eligible users, visits, feed loads per visit, items shown per load, impressions, actions per impression, and downstream retention. A useful decomposition is:
-
Segmentation is not optional in ranking problems. Always inspect new versus tenured members, job seekers versus non-job seekers, creators versus consumers, mobile versus desktop, geography, language, network size, and cold-start users. Aggregate gains can hide harm to sparse-history members or niche content producers.
-
Cold start changes both modeling and evaluation. For new users, rely more on declared profile fields, onboarding intents, location, industry, skills, and popularity priors. For new items, evaluate exposure fairness and early engagement separately, because historical engagement labels are missing or biased by prior ranking.
-
Position bias contaminates naive relevance labels. Items ranked high receive more clicks regardless of quality. A DS should mention randomized exploration buckets, inverse propensity scoring, or interleaving when estimating unbiased relevance:
where is the probability the item was shown in that position. -
Short-term and long-term metrics can conflict. A model increasing
CTRmay reduce trust through low-quality viral posts, stale content, or irrelevant job recommendations. LinkedIn-style surfaces often need composite objectives that include satisfaction surveys, negative feedback, diversity, creator health, and repeat usage after 7 or 28 days. -
Model evaluation is not the same as product evaluation. A larger
AUCorNDCG@10does not guarantee better business outcomes if labels are misaligned, exploration changes exposure, or the new ranker shifts traffic toward low-value actions. A strong DS ties offline lifts to expected online movement, then validates with controlled experiments.
Worked example
For “Evaluate 'Job You May Be Interested In' Recommender”, a strong candidate would first clarify the recommendation surface: email, homepage module, job tab, or notification, because user intent and acceptable frequency differ. They would ask what the current goal is: more applications, more qualified applications, better job-seeker retention, or employer success. Then they would frame the answer around four pillars: offline relevance evaluation, online A/B testing, marketplace guardrails, and diagnostic segmentation.
The offline section would include Recall@K, NDCG@K, and calibration for predicted apply probability, but would explicitly warn that historical applications are biased by previous exposure. The online experiment would define a primary metric such as qualified apply_completion_rate per eligible member, with guardrails like notification opt-outs, irrelevant-job feedback, employer response rate, and application quality. A key tradeoff to flag is volume versus quality: a ranker can increase applications by showing easy-apply jobs more often while lowering recruiter satisfaction or job-seeker trust. The candidate should also segment by active job seekers, passive candidates, geography, seniority, industry, and cold-start members, because relevance signals vary heavily across those cohorts. They could close by saying: “If I had more time, I would add long-term outcomes like interview starts, hires, 28-day job-seeker retention, and employer repeat posting, because immediate apply clicks are only a proxy.”
A second angle
For “Analyze homepage drop and feed ranking”, the same recommender-system thinking applies, but the framing is diagnostic rather than evaluative. Instead of designing success metrics for a planned experiment, the candidate needs to isolate whether the drop came from traffic mix, logging, eligibility, ranking quality, content supply, or user behavior after exposure. A strong answer decomposes homepage_engagement into users, sessions, feed loads, impressions, ranking positions, action rates, and negative feedback. The ranking angle appears when checking whether specific content types, creators, or member cohorts lost distribution after a model or policy change. The best candidates also separate “true product decline” from measurement artifacts by validating metric consistency across independent signals, without drifting into pipeline implementation details.
Common pitfalls
Pitfall: Optimizing only
CTRand calling it recommendation quality.
Clicks are easy to measure but often reward sensational, stale, or low-value content. A better answer distinguishes immediate engagement from long-term member value using dwell_time, meaningful interactions, hides, reports, survey satisfaction, return rate, and marketplace outcomes.
Pitfall: Giving an ML architecture answer when the interviewer asked for DS evaluation.
For a Data Scientist, the core is not how to serve embeddings or tune retrieval latency. Stay focused on label definition, offline-to-online metric alignment, experiment design, causal interpretation, cohort cuts, and whether the recommender improves the LinkedIn ecosystem.
Pitfall: Treating aggregate experiment lift as sufficient.
A feed or jobs recommender can show a positive average treatment effect while harming new members, small creators, niche job categories, or employers receiving lower-quality applicants. Strong candidates proactively discuss heterogeneity, guardrails, multiple testing discipline, and whether the launch decision changes by segment.
Connections
Interviewers may pivot from here into A/B testing, causal inference, metric design, ranking evaluation, or marketplace analytics. Be ready to discuss interference, novelty effects, counterfactual evaluation, power analysis, and how to choose between competing north-star and guardrail metrics.
Further reading
-
Recommender Systems Handbook, Ricci, Rokach, and Shapira — comprehensive reference for retrieval, ranking, evaluation, and recommender tradeoffs.
-
Deep Neural Networks for YouTube Recommendations, Covington et al. — classic multi-stage recommendation architecture with practical label and ranking considerations.
-
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms, Li et al. — foundational paper for counterfactual evaluation and propensity-based reasoning in recommendation systems.
Featured in interview prep guides
Practice questions
- Analyze homepage drop and feed rankingLinkedIn · Data Scientist · Technical Screen · hard
- Design a short-video recommendation systemLinkedIn · Data Scientist · Technical Screen · medium
- Evaluate 'Job You May Be Interested In' RecommenderLinkedIn · Data Scientist · Onsite · hard
- Assess LinkedIn Newsfeed HealthLinkedIn · Data Scientist · Onsite · hard
- One of the most comprehensive LinkedIn DS Product Cases!LinkedIn · Data Scientist · Onsite · hard
Related concepts
- Recommender And Ranking SystemsMachine Learning
- Recommender Systems And Feed RankingMachine Learning
- Recommender, Ranking, And Ads ML Systems
- Ranking, Recommendation, And Feedback SystemsML System Design
- Recommendation, Ads Ranking And Marketplace ObjectivesMachine Learning
- Recommender, Ranking, And Ads Systems