PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates the ability to compute weighted field-based similarity and reason about adjacency relationships in an undirected graph, testing competencies in similarity scoring, record linkage, and basic graph traversal.

  • medium
  • Stripe
  • Coding & Algorithms
  • Software Engineer

Find linked user records by similarity

Company: Stripe

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given a list of user records. Each record has the following fields: - `id` (integer, unique) - `name` (string) - `email` (string) - `company` (string) You are also given a map `weights` from field name to a non-negative floating-point weight, for example: ```text name: 0.2 email: 0.5 company:0.3 ``` Define the **similarity score** between two records `A` and `B` as: - For each field among `{name, email, company}`: - If `A.field == B.field` (string equality), add `weights[field]` to the score. - Otherwise add `0`. - The total similarity score is the sum over all fields. Two records are said to be **directly linked** if their similarity score is **greater than or equal to** a given `threshold` (where `0 <= threshold <= 1`). Treat the relation as **undirected**: if record `i` is directly linked to record `j`, then `j` is directly linked to `i`. You are given: - `rows`: a list of user records as described above. - `weights`: the map from field name to weight. - `threshold`: a floating-point number. - `target_user_id`: the `id` of a record in `rows`. Assume: - `id` values are unique. - `rows` fits in memory (you do not need to design a distributed system). Example: ```text rows = [ { id: 1, name: "Alice", email: "alice@gmail.com", company: "Stripe" }, { id: 2, name: "Alicia", email: "alice@gmail.com", company: "Stripe" }, { id: 3, name: "Alice", email: "alice@yahoo.com", company: "Google" }, { id: 4, name: "Bob", email: "bob@gmail.com", company: "Stripe" } ] weights = { name: 0.2, email: 0.5, company: 0.3 } threshold = 0.5 ``` Under the definition above: - Records 1 and 2 have the same `email` and `company`, so their similarity is `0.5 + 0.3 = 0.8 >= 0.5` → directly linked. - Records 1 and 3 only share the same `name`, similarity is `0.2 < 0.5` → **not** directly linked. - Records 2 and 3 share no identical fields under this example (depending on the exact strings) → assume not directly linked. --- ### Part 1 — Direct matches for a target user Implement a function that, given `rows`, `weights`, `threshold`, and `target_user_id`, returns **all record IDs** whose records are **directly linked** to the target user's record (based solely on the similarity score definition). You can choose any reasonable function signature, such as: ```text find_direct_links(rows, weights, threshold, target_user_id) -> List[int] ``` The result should **not** include `target_user_id` itself. --- ### Part 2 — Include exactly one layer of indirect matches Now extend the requirement. Construct an undirected graph where: - Each node is a record (identified by its `id`). - There is an edge between two nodes if the corresponding records are **directly linked** (similarity ≥ threshold). For a given `target_user_id`, you must now return all record IDs that are: - Directly linked to the target **or** - Directly linked to some record that is directly linked to the target (i.e., reachable from the target by a path of length **at most 2 edges**). Example scenario: - 1 and 2 are directly linked. - 2 and 3 are directly linked. - 1 and 3 are **not** directly linked. In this case, for `target_user_id = 1`, your output should include `{2, 3}`: record 2 is a direct match; record 3 is a one-hop indirect match through 2. Design and implement this extended query. --- ### Part 3 — All transitively linked users (full component) Further extend the requirement so that, for a given `target_user_id`, you return **all record IDs in the same connected component** as the target in the graph defined above. In other words, return every record that is reachable from the target by **any number of edges** (0 or more), based on the direct-link relation. Implement a function such as: ```text find_all_linked(rows, weights, threshold, target_user_id) -> List[int] ``` that returns the IDs of all records (excluding or including the target itself — specify your choice) that belong to the same connected component as the target.

Quick Answer: This question evaluates the ability to compute weighted field-based similarity and reason about adjacency relationships in an undirected graph, testing competencies in similarity scoring, record linkage, and basic graph traversal.

Find linked user records by similarity — Part 1: Direct matches

You are given a list of user records. Each record has the fields `id` (unique int), `name` (string), `email` (string), and `company` (string). You are also given `weights`, a map from each of the fields `{name, email, company}` to a non-negative float weight (e.g. name=0.2, email=0.5, company=0.3). The **similarity score** between two records A and B is the sum, over the three fields, of `weights[field]` whenever `A.field == B.field` (exact string equality), and 0 otherwise. Two records are **directly linked** if their similarity score is `>= threshold` (the relation is undirected). Given `rows`, `weights`, `threshold`, and a `target_user_id` (the id of a record in `rows`), return **all record IDs directly linked to the target's record**. The result must **not** include `target_user_id` itself. Return the IDs sorted in ascending order. Example: with the rows {1:Alice/alice@gmail.com/Stripe, 2:Alicia/alice@gmail.com/Stripe, 3:Alice/alice@yahoo.com/Google, 4:Bob/bob@gmail.com/Stripe}, weights {name:0.2, email:0.5, company:0.3}, threshold 0.5, target 1 -> records 1 and 2 share email+company (0.8 >= 0.5) so the answer is [2].

Constraints

  • id values are unique integers.
  • 0 <= threshold <= 1.
  • weights are non-negative floats over the fields {name, email, company}.
  • rows fits in memory.
  • target_user_id is the id of a record present in rows.

Examples

Input: ([{'id': 1, 'name': 'Alice', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 2, 'name': 'Alicia', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 3, 'name': 'Alice', 'email': 'alice@yahoo.com', 'company': 'Google'}, {'id': 4, 'name': 'Bob', 'email': 'bob@gmail.com', 'company': 'Stripe'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2]

Explanation: Records 1 and 2 share email (0.5) and company (0.3) = 0.8 >= 0.5. Record 3 shares only name (0.2) and record 4 shares only company (0.3), both below 0.5.

Input: ([{'id': 1, 'name': 'Alice', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 2, 'name': 'Alicia', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 3, 'name': 'Alice', 'email': 'alice@yahoo.com', 'company': 'Google'}, {'id': 4, 'name': 'Bob', 'email': 'bob@gmail.com', 'company': 'Stripe'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.2, 1)

Expected Output: [2, 3, 4]

Explanation: With threshold 0.2 every record sharing at least the name (3), company (4), or email+company (2) clears the bar.

Input: ([{'id': 1, 'name': 'Alice', 'email': 'alice@gmail.com', 'company': 'Stripe'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: []

Explanation: Only the target exists, so there is nothing to link to.

Input: ([{'id': 10, 'name': 'X', 'email': 'x@a.com', 'company': 'C'}, {'id': 20, 'name': 'X', 'email': 'x@a.com', 'company': 'C'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 1.0, 10)

Expected Output: [20]

Explanation: Threshold 1.0 requires all three fields to match; records 10 and 20 are identical so similarity is exactly 1.0.

Input: ([{'id': 1, 'name': 'A', 'email': 'a', 'company': 'c'}, {'id': 2, 'name': 'B', 'email': 'b', 'company': 'd'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.0, 1)

Expected Output: [2]

Explanation: Threshold 0.0 links every other record regardless of shared fields (0 >= 0).

Hints

  1. Locate the target record first; if it is absent, return an empty list.
  2. Similarity only depends on exact-string equality per field, so you just sum the weights of the matching fields.
  3. Skip the target itself, and sort the resulting IDs before returning for a deterministic answer.

Find linked user records by similarity — Part 2: Up to one indirect hop

Same setup as Part 1 (records, weights, threshold, and the directly-linked relation). Build the undirected graph whose nodes are records and whose edges connect any two **directly linked** records (similarity >= threshold). For a given `target_user_id`, return all record IDs reachable from the target by a path of **at most 2 edges** — that is, records that are directly linked to the target, OR directly linked to some record that is directly linked to the target. Exclude `target_user_id` itself. Return the IDs sorted ascending. Example scenario: 1-2 are directly linked and 2-3 are directly linked, but 1-3 are not. For target 1 the answer is [2, 3] (2 is direct, 3 is reachable in two hops through 2).

Constraints

  • id values are unique integers.
  • 0 <= threshold <= 1.
  • Edges are undirected: i linked to j implies j linked to i.
  • Reachability bound is exactly 2 edges (direct + one indirect hop).
  • target_user_id is present in rows.

Examples

Input: ([{'id': 1, 'name': 'Alice', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 2, 'name': 'Alicia', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 3, 'name': 'Alicia', 'email': 'alice@yahoo.com', 'company': 'Stripe'}, {'id': 4, 'name': 'Bob', 'email': 'bob@x.com', 'company': 'Google'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2, 3]

Explanation: 1-2 linked via email+company (0.8). 2-3 linked via name+company (0.5). 1-3 not directly linked (only company, 0.3). So 3 is reached through 2 in two hops; 4 is isolated.

Input: ([{'id': 1, 'name': 'Alice', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 2, 'name': 'Alicia', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 3, 'name': 'Alice', 'email': 'alice@yahoo.com', 'company': 'Google'}, {'id': 4, 'name': 'Bob', 'email': 'bob@gmail.com', 'company': 'Stripe'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2]

Explanation: Only 1-2 clears the threshold and 2 has no further qualifying neighbors, so two-hop reach is just [2].

Input: ([{'id': 1, 'name': 'A', 'email': 'e1', 'company': 'X'}, {'id': 2, 'name': 'A', 'email': 'e2', 'company': 'X'}, {'id': 3, 'name': 'A', 'email': 'e3', 'company': 'X'}, {'id': 4, 'name': 'A', 'email': 'e4', 'company': 'X'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2, 3, 4]

Explanation: Every pair shares name+company (0.5), so the graph is a clique; all other nodes are within one hop of the target.

Input: ([{'id': 1, 'name': 'A', 'email': 'e', 'company': 'c'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: []

Explanation: Single record has no neighbors.

Input: ([{'id': 1, 'name': 'A', 'email': 'e1', 'company': 'c1'}, {'id': 2, 'name': 'B', 'email': 'e2', 'company': 'c2'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: []

Explanation: The two records share nothing, so the target has no links at all.

Input: ([{'id': 1, 'name': 'n1', 'email': 'shared', 'company': 'c1'}, {'id': 2, 'name': 'n2', 'email': 'shared', 'company': 'c2'}, {'id': 3, 'name': 'n3', 'email': 'shared', 'company': 'c3'}, {'id': 4, 'name': 'n4', 'email': 'shared', 'company': 'c4'}, {'id': 5, 'name': 'lonely', 'email': 'lonely', 'company': 'lonely'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2, 3, 4]

Explanation: Records 1-4 all share the same email (0.5) forming a clique; node 5 is isolated and unreachable.

Hints

  1. First materialize the direct-link graph by comparing all pairs once (similarity is symmetric).
  2. Collect the target's direct neighbors, then union in each direct neighbor's neighbors — that is precisely the set of nodes within 2 edges.
  3. Remember to drop the target id from the final set (it shows up as a neighbor of its own neighbors).

Find linked user records by similarity — Part 3: Full connected component

Same direct-link graph as Parts 1 and 2. For a given `target_user_id`, return **all record IDs in the same connected component** as the target — every record reachable from the target by any number of edges (transitive closure of the direct-link relation). Exclude `target_user_id` itself from the result and return the IDs sorted ascending. This generalizes Part 2 from "at most 2 hops" to "any number of hops," so a graph traversal (BFS or DFS) from the target over the direct-link edges yields the whole component.

Constraints

  • id values are unique integers.
  • 0 <= threshold <= 1.
  • Component membership is the transitive closure of the undirected direct-link relation.
  • The result excludes target_user_id (it is reachable from itself by definition).
  • target_user_id is present in rows.

Examples

Input: ([{'id': 1, 'name': 'a', 'email': 'shared1', 'company': 'x'}, {'id': 2, 'name': 'b', 'email': 'shared1', 'company': 'k'}, {'id': 3, 'name': 'b', 'email': 'zzz', 'company': 'k'}, {'id': 4, 'name': 'b', 'email': 'qqq', 'company': 'k'}, {'id': 5, 'name': 'lonely', 'email': 'solo', 'company': 'solo'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2, 3, 4]

Explanation: 1-2 linked via email (0.5). 2-3, 2-4, 3-4 linked via name+company (0.5). 1 is NOT directly linked to 3 or 4 (no shared fields) yet they are in the same component, so transitive traversal returns [2,3,4]; node 5 is isolated.

Input: ([{'id': 1, 'name': 'Alice', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 2, 'name': 'Alicia', 'email': 'alice@gmail.com', 'company': 'Stripe'}, {'id': 3, 'name': 'Alice', 'email': 'alice@yahoo.com', 'company': 'Google'}, {'id': 4, 'name': 'Bob', 'email': 'bob@gmail.com', 'company': 'Stripe'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: [2]

Explanation: Only 1-2 are linked; that two-node component minus the target is [2].

Input: ([{'id': 1, 'name': 'A', 'email': 'e', 'company': 'c'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 1)

Expected Output: []

Explanation: Single record forms a component of size 1; excluding the target leaves nothing.

Input: ([{'id': 1, 'name': 'A', 'email': 'e1', 'company': 'c1'}, {'id': 2, 'name': 'B', 'email': 'e2', 'company': 'c2'}, {'id': 3, 'name': 'C', 'email': 'e3', 'company': 'c3'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 2)

Expected Output: []

Explanation: No two records share enough; the target's component is just itself.

Input: ([{'id': 1, 'name': 's', 'email': 'a', 'company': 'b'}, {'id': 2, 'name': 's', 'email': 'a', 'company': 'b'}, {'id': 3, 'name': 's', 'email': 'a', 'company': 'b'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 1.0, 2)

Expected Output: [1, 3]

Explanation: All three records are identical (similarity 1.0 >= 1.0) so they form one component; from target 2 the rest is [1,3].

Input: ([{'id': 7, 'name': 'a', 'email': 'shared', 'company': 'c7'}, {'id': 8, 'name': 'b', 'email': 'shared', 'company': 'c8'}, {'id': 9, 'name': 'd', 'email': 'other', 'company': 'c9'}], {'name': 0.2, 'email': 0.5, 'company': 0.3}, 0.5, 9)

Expected Output: []

Explanation: Records 7 and 8 share email (a component of their own), but the target 9 shares nothing, so its component is empty after excluding itself.

Hints

  1. Build the same direct-link adjacency as Part 2.
  2. Run BFS or DFS from the target; any node you can reach belongs to its connected component.
  3. Mark nodes visited as you enqueue them to avoid revisiting in cyclic graphs, then drop the target from the final set.
Last updated: Jun 21, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Assign Reviewers from Changed Files - Stripe (medium)
  • Generate Account Email Notifications - Stripe (medium)
  • Calculate Transaction Fees - Stripe (medium)
  • Build an Account Transfer Ledger - Stripe (medium)
  • Implement Validation and String Compression - Stripe (hard)