Match payments to invoices by memo or amount
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Interview Round: Technical Screen
Quick Answer: This Coding & Algorithms question for a Data Scientist evaluates string parsing and pattern matching, keyed and numeric record matching, date-based tie-breaking, and algorithmic complexity analysis, and is commonly asked to test handling of messy input formats and deterministic reconciliation rules.
Match Payments to Invoices by Memo
Constraints
- 1 <= len(invoices), len(payments) (both may also be empty -> return [])
- invoice_id is unique across invoices
- memo may be empty, None, or contain extra whitespace around the id
- The standardized reference, when present, is exactly the prefix 'Paying off:' followed by the invoice id
Examples
Input: ([{'invoice_id': 'INV-001', 'amount_cents': 5000, 'due_date': '2024-01-15'}, {'invoice_id': 'INV-002', 'amount_cents': 7500, 'due_date': '2024-02-01'}], [{'payment_id': 'PAY-1', 'amount_cents': 5000, 'memo': 'Paying off: INV-001'}, {'payment_id': 'PAY-2', 'amount_cents': 7500, 'memo': ' Paying off: INV-002 '}])
Expected Output: [{'payment_id': 'PAY-1', 'matched': True, 'invoice_id': 'INV-001', 'invoice': {'invoice_id': 'INV-001', 'amount_cents': 5000, 'due_date': '2024-01-15'}}, {'payment_id': 'PAY-2', 'matched': True, 'invoice_id': 'INV-002', 'invoice': {'invoice_id': 'INV-002', 'amount_cents': 7500, 'due_date': '2024-02-01'}}]
Explanation: Both memos carry a valid 'Paying off:' reference; the second has extra surrounding and internal whitespace which is stripped, yielding INV-002.
Input: ([{'invoice_id': 'INV-001', 'amount_cents': 5000, 'due_date': '2024-01-15'}], [{'payment_id': 'PAY-3', 'amount_cents': 5000, 'memo': 'Paying off: INV-999'}, {'payment_id': 'PAY-4', 'amount_cents': 5000, 'memo': ''}, {'payment_id': 'PAY-5', 'amount_cents': 5000, 'memo': 'random note'}])
Expected Output: [{'payment_id': 'PAY-3', 'matched': False, 'invoice_id': 'INV-999', 'reason': 'invoice not found'}, {'payment_id': 'PAY-4', 'matched': False, 'reason': 'no invoice reference in memo'}, {'payment_id': 'PAY-5', 'matched': False, 'reason': 'no invoice reference in memo'}]
Explanation: PAY-3 extracts INV-999 which is absent (invoice not found). PAY-4 has an empty memo and PAY-5 has no prefix, so both are 'no invoice reference'.
Input: ([], [])
Expected Output: []
Explanation: No payments to process, so the result list is empty.
Input: ([{'invoice_id': 'A1', 'amount_cents': 100, 'due_date': '2024-03-03'}], [{'payment_id': 'PX', 'amount_cents': 100, 'memo': 'Paying off:'}])
Expected Output: [{'payment_id': 'PX', 'matched': False, 'reason': 'no invoice reference in memo'}]
Explanation: The prefix is present but nothing follows it, so the extracted id is empty and treated as no reference.
Hints
- Build a dictionary mapping invoice_id -> invoice for O(1) lookups instead of scanning the invoice list per payment.
- Use str.find('Paying off:') to locate the prefix, then slice everything after it and .strip() to remove surrounding whitespace.
- Treat three failure modes the same 'no invoice reference' way: missing/empty memo, no prefix, and prefix present but nothing after it. A non-empty extracted id that simply isn't in the map is a distinct 'invoice not found' case.
Hybrid Payment Matching: Memo or Amount
Constraints
- invoices and payments may each be empty -> return []
- invoice_id is unique across invoices
- Multiple invoices may share the same amount_cents; the earliest due_date wins
- ISO-8601 due_date strings (YYYY-MM-DD) are lexicographically comparable, so string comparison gives correct chronological ordering
- memo may be None, empty, or contain whitespace around the id
Examples
Input: ([{'invoice_id': 'INV-001', 'amount_cents': 5000, 'due_date': '2024-01-15'}, {'invoice_id': 'INV-002', 'amount_cents': 7500, 'due_date': '2024-02-01'}], [{'payment_id': 'PAY-1', 'amount_cents': 5000, 'memo': 'Paying off: INV-001'}, {'payment_id': 'PAY-2', 'amount_cents': 7500, 'memo': 'monthly transfer'}])
Expected Output: [{'payment_id': 'PAY-1', 'matched': True, 'method': 'memo', 'invoice_id': 'INV-001', 'invoice': {'invoice_id': 'INV-001', 'amount_cents': 5000, 'due_date': '2024-01-15'}}, {'payment_id': 'PAY-2', 'matched': True, 'method': 'amount', 'invoice_id': 'INV-002', 'invoice': {'invoice_id': 'INV-002', 'amount_cents': 7500, 'due_date': '2024-02-01'}}]
Explanation: PAY-1 has a memo reference and matches by memo. PAY-2 has no prefix, so it falls back to amount matching and finds the unique 7500-cent invoice INV-002.
Input: ([{'invoice_id': 'A', 'amount_cents': 1000, 'due_date': '2024-05-10'}, {'invoice_id': 'B', 'amount_cents': 1000, 'due_date': '2024-03-01'}, {'invoice_id': 'C', 'amount_cents': 1000, 'due_date': '2024-04-20'}], [{'payment_id': 'P1', 'amount_cents': 1000, 'memo': 'no reference here'}])
Expected Output: [{'payment_id': 'P1', 'matched': True, 'method': 'amount', 'invoice_id': 'B', 'invoice': {'invoice_id': 'B', 'amount_cents': 1000, 'due_date': '2024-03-01'}}]
Explanation: Three invoices share amount 1000; with no memo reference, amount matching picks the earliest due_date (2024-03-01), which is invoice B.
Input: ([{'invoice_id': 'X', 'amount_cents': 200, 'due_date': '2024-01-01'}], [{'payment_id': 'PN', 'amount_cents': 999, 'memo': ''}, {'payment_id': 'PM', 'amount_cents': 200, 'memo': 'Paying off: NOPE'}])
Expected Output: [{'payment_id': 'PN', 'matched': False, 'method': 'amount', 'reason': 'cannot find matching invoice for amount 999'}, {'payment_id': 'PM', 'matched': False, 'method': 'memo', 'invoice_id': 'NOPE', 'reason': 'invoice not found'}]
Explanation: PN has no memo prefix and no invoice with amount 999, so amount matching fails. PM has a memo reference to NOPE which does not exist, so memo matching reports invoice not found.
Input: ([], [])
Expected Output: []
Explanation: No payments to process, so the result list is empty.
Hints
- Precompute two structures from invoices: a dict invoice_id -> invoice, and a dict amount_cents -> list of invoices. This avoids re-scanning all invoices for every payment.
- Branch on whether 'Paying off:' is in the memo: if so reuse the first problem's memo logic; otherwise go to the amount bucket.
- For the amount tiebreak, since due_date is ISO-8601 (YYYY-MM-DD), plain string min() on due_date already yields the earliest date with no date parsing.
- Tag every result with 'method' ('memo' or 'amount') so callers can tell which path produced the match.