Find linked user records by weighted similarity

Q: Find linked user records by weighted similarity

This question evaluates skills in similarity-based record linkage, weighted field scoring, and graph connectivity analysis within the Coding & Algorithms domain, examining competency in designing scalable matching strategies, thresholded similarity, and handling direct and indirect links between records.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Loading...

You are given a list of user records. Each record has fields:

id (unique)
name
email
company

You are also given:

weights : a map from field name to weight (e.g., name: 0.2, email: 0.5, company: 0.3 )
threshold : a float
target_user_id

Similarity scoring

Define similarity(recordA, recordB) as the sum over fields of:

weights[field] * field_similarity(field_valueA, field_valueB)

where field_similarity returns a value in [0,1] (the exact function is provided/assumed in the interview; for example, exact match => 1, otherwise 0; or a string similarity).

Two records are considered linked if their total similarity score is >= threshold.

Task

Return all record IDs that should be considered the same user as target_user_id.

Follow-up 1: include 1-hop indirect links

Include not only records directly linked to the target, but also records linked to those direct matches (i.e., within 2 steps from the target), even if they are not directly linked to the target.

Follow-up 2: include all indirect links (connected component)

Return all record IDs in the entire connected component containing target_user_id, where edges connect pairs of records whose similarity is >= threshold.

Notes

Clarify whether the output includes the target ID itself.
Aim for an approach that avoids unnecessary pairwise comparisons when possible (discuss indexing/blocking if relevant).