Match payments to invoices by memo or amount
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates string parsing, record linkage, and algorithmic matching skills for data reconciliation tasks, including handling messy memo text, exact amount matches, and tie-breaking by earliest due date.
Part 1: Match Payments to Invoices Using Memo Invoice IDs
Constraints
- 0 <= len(invoices), len(payments) <= 200000
- Each invoice_id is unique and each payment_id is unique
- memo may contain extra spaces after the marker and before the invoice id token
- Aim for near-linear time in the total number of invoices and payments
Examples
Input: ([{'invoice_id': 'INV-100', 'amount': 50, 'due_date': '2024-01-05'}, {'invoice_id': 'INV-200', 'amount': 75, 'due_date': '2024-01-10'}], [{'payment_id': 'P1', 'amount': 50, 'memo': 'Paying off: INV-100 thanks'}, {'payment_id': 'P2', 'amount': 75, 'memo': ' Paying off: INV-200'}])
Expected Output: [{'payment_id': 'P1', 'matched_invoice_id': 'INV-100', 'match_mode': 'id'}, {'payment_id': 'P2', 'matched_invoice_id': 'INV-200', 'match_mode': 'id'}]
Explanation: Both payments contain the marker and point to existing invoice ids. Extra spaces are ignored.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P3', 'amount': 10, 'memo': 'Paying off: INV-999'}])
Expected Output: [{'payment_id': 'P3', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find invoice_id=INV-999'}]
Explanation: The memo contains an invoice id token, but that invoice does not exist.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P4', 'amount': 10, 'memo': 'Invoice INV-1'}])
Expected Output: [{'payment_id': 'P4', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing memo marker'}]
Explanation: This payment does not contain the required marker.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P5', 'amount': 10, 'memo': 'Paying off: '}])
Expected Output: [{'payment_id': 'P5', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing invoice id in memo'}]
Explanation: Edge case: the marker exists, but there is no token after it.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [])
Expected Output: []
Explanation: Edge case: no payments means no results.
Hints
- Build a hash map from invoice_id to invoice so lookup is O(1) per payment.
- After finding the marker, slice the remainder of the memo, strip spaces, and take the first whitespace-separated token.
Part 2: Match Payments by Invoice ID or by Amount
Constraints
- 0 <= len(invoices), len(payments) <= 200000
- Each invoice_id is unique and each payment_id is unique
- due_date is in ISO format YYYY-MM-DD, so lexicographic string comparison matches chronological order
- Amounts can repeat across invoices; for amount-based matching choose earliest due_date, then lexicographically smallest invoice_id if still tied
Examples
Input: ([{'invoice_id': 'INV-1', 'amount': 100, 'due_date': '2024-06-15'}, {'invoice_id': 'INV-2', 'amount': 100, 'due_date': '2024-05-01'}, {'invoice_id': 'INV-3', 'amount': 50, 'due_date': '2024-07-01'}], [{'payment_id': 'P1', 'amount': 50, 'memo': 'Paying off: INV-3 extra'}, {'payment_id': 'P2', 'amount': 100, 'memo': 'Bank transfer'}, {'payment_id': 'P3', 'amount': 75, 'memo': 'Unknown'}])
Expected Output: [{'payment_id': 'P1', 'matched_invoice_id': 'INV-3', 'match_mode': 'id'}, {'payment_id': 'P2', 'matched_invoice_id': 'INV-2', 'match_mode': 'amount'}, {'payment_id': 'P3', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find matching invoice for amount=75'}]
Explanation: P1 uses memo-based matching. P2 has no marker, so it matches by amount to the earliest due date among the 100-amount invoices. P3 has no amount match.
Input: ([{'invoice_id': 'INV-10', 'amount': 200, 'due_date': '2024-03-01'}], [{'payment_id': 'P4', 'amount': 200, 'memo': 'Paying off: INV-404'}])
Expected Output: [{'payment_id': 'P4', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find invoice_id=INV-404'}]
Explanation: Because the marker exists, the payment must use ID-based logic and does not fall back to amount-based matching.
Input: ([{'invoice_id': 'INV-20', 'amount': 30, 'due_date': '2024-04-01'}], [{'payment_id': 'P5', 'amount': 30, 'memo': 'Paying off: '}])
Expected Output: [{'payment_id': 'P5', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing invoice id in memo'}]
Explanation: Edge case: the marker exists but no invoice id token follows it, so the payment is unmatched.
Input: ([{'invoice_id': 'A-2', 'amount': 60, 'due_date': '2024-01-01'}, {'invoice_id': 'A-1', 'amount': 60, 'due_date': '2024-01-01'}], [{'payment_id': 'P6', 'amount': 60, 'memo': 'wire'}])
Expected Output: [{'payment_id': 'P6', 'matched_invoice_id': 'A-1', 'match_mode': 'amount'}]
Explanation: Both invoices match by amount and due date, so the lexicographically smaller invoice_id is chosen.
Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [])
Expected Output: []
Explanation: Edge case: no payments means the result list is empty.
Hints
- Precompute two structures: invoice_id -> invoice and amount -> best invoice for that amount.
- If the marker Paying off: exists, do not fall back to amount-based matching even if the parsed invoice id is missing or invalid.