Resolve routing-number to bank mapping
Company: Plaid
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's skills in data normalization, deduplication, aggregation, and conflict resolution across multiple JSON data sources, with emphasis on canonicalization of aliases and handling inconsistent mappings.
Part 1: Normalize Routing-Number Bank Mappings
Constraints
- 0 <= len(records) <= 100000
- 0 <= total number of alias strings across all banks <= 100000
- Each record contains keys 'routing_number' and 'bank_name'
- Routing numbers should be treated as strings in the output
- After normalization, alias groups do not overlap
Examples
Input: ([{'routing_number': '111000025', 'bank_name': ' Bank of America '}, {'routing_number': '111000025', 'bank_name': 'BOFA'}, {'routing_number': '021000021', 'bank_name': ' chase '}, {'routing_number': '021000021', 'bank_name': 'JPMORGAN CHASE'}], {'Bank of America': ['BOFA', 'B.O.A.', 'Bank of america'], 'Chase': ['JPMorgan Chase', 'JP Morgan Chase']})
Expected Output: {'111000025': 'Bank of America', '021000021': 'Chase'}
Explanation: Both records for 111000025 resolve to Bank of America, and both records for 021000021 resolve to Chase after case-insensitive normalization and alias matching.
Input: ([{'routing_number': '111', 'bank_name': 'alpha bank'}, {'routing_number': '111', 'bank_name': 'Beta Bank'}, {'routing_number': '111', 'bank_name': ' ALPHA BANK '}], {'Alpha Bank': ['Alpha bank'], 'Beta Bank': ['beta bank']})
Expected Output: {'111': 'CONFLICT'}
Explanation: The same routing number resolves to Alpha Bank and Beta Bank, so the final value must be CONFLICT. Once conflicted, it stays conflicted.
Input: ([{'routing_number': '333', 'bank_name': ' First National Bank '}, {'routing_number': '333', 'bank_name': 'first national bank'}], {})
Expected Output: {'333': 'first national bank'}
Explanation: This bank does not appear in the alias dictionary, so its normalized lowercase form becomes its canonical name.
Input: ([], {'Bank of America': ['BOFA']})
Expected Output: {}
Explanation: With no records, the result is an empty mapping.
Input: ([{'routing_number': '555', 'bank_name': ' wElLs fArGo '}], {'Wells Fargo': ['Wells Fargo Bank']})
Expected Output: {'555': 'Wells Fargo'}
Explanation: A canonical name should also match directly after normalization, even if the raw record does not use one of the listed aliases.
Hints
- Build one reverse hash map from every known alias to its canonical bank name before processing the records.
- For each routing number, remember the first resolved bank. If a later resolved bank is different, mark that routing number as a conflict.
Part 2: Resolve Routing Numbers Across Multiple Data Sources
Constraints
- 0 <= len(sources) <= 10000
- The total number of records across all sources is at most 100000
- Each source contains keys 'source', 'weight', and 'records'
- Each weight is a positive integer
- After normalization, alias groups do not overlap
Examples
Input: ([[('111', ' Bank of America '), ('111', 'boa')], [('111', 'Bank of America')]], [2, 1], {'boa': 'Bank of America'})
Expected Output: {'111': 'Bank of America'}
Explanation: In the first source, both names normalize to Bank of America, so that source casts one vote worth 2. The second source adds another vote worth 1. Bank of America wins with total weight 3.
Input: ([[('222', 'Chase'), ('222', ' JP Morgan Chase ')], [('222', 'Wells Fargo')], [('222', ' chase ')]], [3, 4, 2], {'jp morgan chase': 'Chase'})
Expected Output: {'222': 'Chase'}
Explanation: Source 1 contributes 3 to Chase because both records are equivalent after alias normalization. Source 2 contributes 4 to Wells Fargo. Source 3 contributes 2 to Chase. Final totals: Chase 5, Wells Fargo 4.
Input: ([[('333', 'Citi')], [('333', 'Wells Fargo')]], [3, 3], {})
Expected Output: {'333': 'AMBIGUOUS'}
Explanation: Citi and Wells Fargo both receive total weight 3, so the routing number is ambiguous.
Input: ([[('444', 'Alpha Bank'), ('444', 'Beta Bank')], [('444', ' alpha bank '), ('444', 'Beta Bank')]], [5, 1], {})
Expected Output: {'444': 'AMBIGUOUS'}
Explanation: Each source has conflicting normalized banks for routing 444, so neither source casts a valid vote. Since the routing number appeared but no valid votes remain, the result is AMBIGUOUS.
Input: ([[('001', ' First Bank '), ('002', 'Second Bank'), ('002', 'second bank')], [], [('001', 'first bank'), ('003', 'Third Bank'), ('003', 'Third Bank')]], [1, 5, 2], {})
Expected Output: {'001': 'First Bank', '002': 'Second Bank', '003': 'Third Bank'}
Explanation: Routing 001 gets one vote from source 1 and one from source 3 for the same normalized bank. Routing 002 appears twice in source 1 with equivalent names, so that source casts one vote. Routing 003 appears twice in source 3 with the same bank, which still counts as one vote.
Input: ([], [], {})
Expected Output: {}
Explanation: No routing numbers appear in the input, so the result is an empty dictionary.
Hints
- Resolve duplicates and contradictions inside each source before doing any cross-source voting.
- Use a nested hash map like routing -> bank -> total score to accumulate weighted votes.