Design an end-to-end ML system to power a "Skills" feature for a professional social network.
The product wants to:
-
Extract and infer a member’s skills from profile text, resume/CV uploads, job titles/descriptions, projects, and possibly user behavior.
-
Normalize skills to a canonical taxonomy (e.g., map "PyTorch" vs "pytorch" vs "torch" to the same skill).
-
Optionally recommend missing skills to add.
Requirements
-
High precision for visible skills; avoid embarrassing incorrect skills.
-
Support near-real-time updates when a user edits their profile or uploads a new resume.
-
Must scale to tens/hundreds of millions of members.
-
Consider privacy/security for resume parsing.
What to cover
-
Data sources and labeling strategy
-
Taxonomy/ontology and normalization
-
Model approach(es) and features
-
Training pipeline and evaluation metrics
-
Serving architecture (online vs offline), freshness/latency
-
Monitoring, bias/fairness, abuse/gaming, and iteration plan