This Coding & Algorithms problem in the Data Science domain evaluates similarity search and ranking skills, including selection and application of distance metrics (e.g., MSE, RMSE, cosine or Manhattan), handling of NULLs, feature scaling/weighting, and efficient top‑k retrieval across datasets.
You are given two datasets with the same feature columns:
source
(rows you want to match):
source_id
(STRING/INT)
f1...fk
(NUMERIC; may contain NULLs)
target
(candidate rows to search):
target_id
(STRING/INT)
f1...fk
(NUMERIC; may contain NULLs)
Task (choose Python or SQL): For each row in source, find the top 5 rows in target that are most similar when considering all features.
Requirements / clarifications:
source_id
,
target_id
,
distance
(smaller = more similar),
rank
(1–5 per
source_id
).