Match payments to invoices by memo or amount
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates string parsing, record linkage, and algorithmic matching skills for data reconciliation tasks, including handling messy memo text, exact amount matches, and tie-breaking by earliest due date.
Part 1: Match Payments to Invoices Using Memo Invoice IDs
Constraints
- 0 <= len(invoices), len(payments) <= 200000
- Each invoice_id is unique and each payment_id is unique
- memo may contain extra spaces after the marker and before the invoice id token
- Aim for near-linear time in the total number of invoices and payments
Examples
Input: ([{'invoice_id': 'INV-100', 'amount': 50, 'due_date': '2024-01-05'}, {'invoice_id': 'INV-200', 'amount': 75, 'due_date': '2024-01-10'}], [{'payment_id': 'P1', 'amount': 50, 'memo': 'Paying off: INV-100 thanks'}, {'payment_id': 'P2', 'amount': 75, 'memo': ' Paying off: INV-200'}])
Expected Output: [{'payment_id': 'P1', 'matched_invoice_id': 'INV-100', 'match_mode': 'id'}, {'payment_id': 'P2', 'matched_invoice_id': 'INV-200', 'match_mode': 'id'}]
Explanation: Both payments contain the marker and point to existing invoice ids. Extra spaces are ignored.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P3', 'amount': 10, 'memo': 'Paying off: INV-999'}])
Expected Output: [{'payment_id': 'P3', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find invoice_id=INV-999'}]
Explanation: The memo contains an invoice id token, but that invoice does not exist.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P4', 'amount': 10, 'memo': 'Invoice INV-1'}])
Expected Output: [{'payment_id': 'P4', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing memo marker'}]
Explanation: This payment does not contain the required marker.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P5', 'amount': 10, 'memo': 'Paying off: '}])
Expected Output: [{'payment_id': 'P5', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing invoice id in memo'}]
Explanation: Edge case: the marker exists, but there is no token after it.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [])
Expected Output: []
Explanation: Edge case: no payments means no results.
Solution
def solution(invoices, payments):
invoice_by_id = {}
for invoice in invoices:
invoice_by_id[invoice['invoice_id']] = invoice
marker = 'Paying off:'
results = []
for payment in payments:
memo = payment.get('memo', '')
idx = memo.find(marker)
if idx == -1:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': None,
'match_mode': 'unmatched',
'error_message': 'missing memo marker'
})
continue
rest = memo[idx + len(marker):].strip()
if not rest:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': None,
'match_mode': 'unmatched',
'error_message': 'missing invoice id in memo'
})
continue
invoice_id = rest.split()[0]
if invoice_id in invoice_by_id:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': invoice_id,
'match_mode': 'id'
})
else:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': None,
'match_mode': 'unmatched',
'error_message': f'cannot find invoice_id={invoice_id}'
})
return resultsTime complexity: O(n + m), where n is the number of invoices and m is the number of payments. Space complexity: O(n).
Hints
- Build a hash map from invoice_id to invoice so lookup is O(1) per payment.
- After finding the marker, slice the remainder of the memo, strip spaces, and take the first whitespace-separated token.
Part 2: Match Payments by Invoice ID or by Amount
Constraints
- 0 <= len(invoices), len(payments) <= 200000
- Each invoice_id is unique and each payment_id is unique
- due_date is in ISO format YYYY-MM-DD, so lexicographic string comparison matches chronological order
- Amounts can repeat across invoices; for amount-based matching choose earliest due_date, then lexicographically smallest invoice_id if still tied
Examples
Input: ([{'invoice_id': 'INV-1', 'amount': 100, 'due_date': '2024-06-15'}, {'invoice_id': 'INV-2', 'amount': 100, 'due_date': '2024-05-01'}, {'invoice_id': 'INV-3', 'amount': 50, 'due_date': '2024-07-01'}], [{'payment_id': 'P1', 'amount': 50, 'memo': 'Paying off: INV-3 extra'}, {'payment_id': 'P2', 'amount': 100, 'memo': 'Bank transfer'}, {'payment_id': 'P3', 'amount': 75, 'memo': 'Unknown'}])
Expected Output: [{'payment_id': 'P1', 'matched_invoice_id': 'INV-3', 'match_mode': 'id'}, {'payment_id': 'P2', 'matched_invoice_id': 'INV-2', 'match_mode': 'amount'}, {'payment_id': 'P3', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find matching invoice for amount=75'}]
Explanation: P1 uses memo-based matching. P2 has no marker, so it matches by amount to the earliest due date among the 100-amount invoices. P3 has no amount match.
Input: ([{'invoice_id': 'INV-10', 'amount': 200, 'due_date': '2024-03-01'}], [{'payment_id': 'P4', 'amount': 200, 'memo': 'Paying off: INV-404'}])
Expected Output: [{'payment_id': 'P4', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find invoice_id=INV-404'}]
Explanation: Because the marker exists, the payment must use ID-based logic and does not fall back to amount-based matching.
Input: ([{'invoice_id': 'INV-20', 'amount': 30, 'due_date': '2024-04-01'}], [{'payment_id': 'P5', 'amount': 30, 'memo': 'Paying off: '}])
Expected Output: [{'payment_id': 'P5', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing invoice id in memo'}]
Explanation: Edge case: the marker exists but no invoice id token follows it, so the payment is unmatched.
Input: ([{'invoice_id': 'A-2', 'amount': 60, 'due_date': '2024-01-01'}, {'invoice_id': 'A-1', 'amount': 60, 'due_date': '2024-01-01'}], [{'payment_id': 'P6', 'amount': 60, 'memo': 'wire'}])
Expected Output: [{'payment_id': 'P6', 'matched_invoice_id': 'A-1', 'match_mode': 'amount'}]
Explanation: Both invoices match by amount and due date, so the lexicographically smaller invoice_id is chosen.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [])
Expected Output: []
Explanation: Edge case: no payments means the result list is empty.
Solution
def solution(invoices, payments):
invoice_by_id = {}
best_by_amount = {}
for invoice in invoices:
invoice_id = invoice['invoice_id']
amount = invoice['amount']
due_date = invoice['due_date']
invoice_by_id[invoice_id] = invoice
current = best_by_amount.get(amount)
if current is None:
best_by_amount[amount] = invoice
else:
current_key = (current['due_date'], current['invoice_id'])
candidate_key = (due_date, invoice_id)
if candidate_key < current_key:
best_by_amount[amount] = invoice
marker = 'Paying off:'
results = []
for payment in payments:
memo = payment.get('memo', '')
idx = memo.find(marker)
if idx != -1:
rest = memo[idx + len(marker):].strip()
if not rest:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': None,
'match_mode': 'unmatched',
'error_message': 'missing invoice id in memo'
})
continue
invoice_id = rest.split()[0]
if invoice_id in invoice_by_id:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': invoice_id,
'match_mode': 'id'
})
else:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': None,
'match_mode': 'unmatched',
'error_message': f'cannot find invoice_id={invoice_id}'
})
else:
amount = payment['amount']
match = best_by_amount.get(amount)
if match is None:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': None,
'match_mode': 'unmatched',
'error_message': f'cannot find matching invoice for amount={amount}'
})
else:
results.append({
'payment_id': payment['payment_id'],
'matched_invoice_id': match['invoice_id'],
'match_mode': 'amount'
})
return resultsTime complexity: O(n + m), where n is the number of invoices and m is the number of payments. Space complexity: O(n).
Hints
- Precompute two structures: invoice_id -> invoice and amount -> best invoice for that amount.
- If the marker Paying off: exists, do not fall back to amount-based matching even if the parsed invoice id is missing or invalid.