PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers

Quick Overview

This question evaluates string parsing, record linkage, and algorithmic matching skills for data reconciliation tasks, including handling messy memo text, exact amount matches, and tie-breaking by earliest due date.

  • easy
  • Google
  • Coding & Algorithms
  • Data Scientist

Match payments to invoices by memo or amount

Company: Google

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

## Scenario You are building a small reconciliation tool that matches **payments** to **invoices**. ### Data structures Assume you are given: - `invoices`: a list of invoice objects/records with fields: - `invoice_id: string` (unique) - `amount: int` (or `decimal`; assume exact match is possible) - `due_date: string` in ISO format `YYYY-MM-DD` - `payments`: a list of payment objects/records with fields: - `payment_id: string` (unique) - `amount: int` - `memo: string` (free-form text; may contain extra spaces) ### Memo format Some payments have a *standard* memo format that includes an invoice id, e.g.: - `"Paying off: INV-12345 ..."` If the memo contains the substring `"Paying off:"`, then the invoice id immediately following it (after trimming spaces) is the target `invoice_id`. (You may assume the invoice id token ends at the next whitespace.) ## Task Implement matching logic to process each payment and produce a match result. ### Part 1 (ID-based) For payments whose memo contains `"Paying off:"`: 1. Parse the invoice id from the memo. 2. Find the invoice with that `invoice_id`. 3. Output a record indicating the payment matched that invoice. 4. If no such invoice exists, output an error for that payment (e.g., `"cannot find invoice_id=..."`). ### Part 2 (extended: ID-based OR amount-based) Extend the logic as follows: - If the memo contains `"Paying off:"`, use the **ID-based** logic from Part 1. - Otherwise, use **amount-based** matching: 1. Find all invoices with `invoice.amount == payment.amount`. 2. If none exist, output an error for that payment (e.g., `"cannot find matching invoice for amount=..."`). 3. If multiple invoices match by amount, choose the one with the **earliest** `due_date`. ## Output requirements Return a list of match results (one per payment), where each result includes: - `payment_id` - `matched_invoice_id` (or `null` if unmatched) - `match_mode`: one of `{ "id", "amount" }` (or `"unmatched"`) - optionally an `error_message` for unmatched cases ## Constraints / edge cases to handle - Extra spaces in `memo` around `"Paying off:"` and/or the invoice id token. - Large input sizes: design for near-linear time in the number of invoices + payments. - Multiple invoices can share the same amount; tie-break using earliest due date.

Quick Answer: This question evaluates string parsing, record linkage, and algorithmic matching skills for data reconciliation tasks, including handling messy memo text, exact amount matches, and tie-breaking by earliest due date.

Part 1: Match Payments to Invoices Using Memo Invoice IDs

Given a list of invoices and a list of payments, process each payment in order. If a payment memo contains the marker Paying off:, parse the invoice id token immediately after the marker. The token ends at the next whitespace after trimming spaces. If that invoice exists, match the payment to it using match_mode id. If the invoice does not exist, return an unmatched result with an error. If the memo does not contain the marker, or the marker is present but no invoice id follows it, also return an unmatched result.

Constraints

  • 0 <= len(invoices), len(payments) <= 200000
  • Each invoice_id is unique and each payment_id is unique
  • memo may contain extra spaces after the marker and before the invoice id token
  • Aim for near-linear time in the total number of invoices and payments

Examples

Input: ([{'invoice_id': 'INV-100', 'amount': 50, 'due_date': '2024-01-05'}, {'invoice_id': 'INV-200', 'amount': 75, 'due_date': '2024-01-10'}], [{'payment_id': 'P1', 'amount': 50, 'memo': 'Paying off: INV-100 thanks'}, {'payment_id': 'P2', 'amount': 75, 'memo': ' Paying off: INV-200'}])

Expected Output: [{'payment_id': 'P1', 'matched_invoice_id': 'INV-100', 'match_mode': 'id'}, {'payment_id': 'P2', 'matched_invoice_id': 'INV-200', 'match_mode': 'id'}]

Explanation: Both payments contain the marker and point to existing invoice ids. Extra spaces are ignored.

Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P3', 'amount': 10, 'memo': 'Paying off: INV-999'}])

Expected Output: [{'payment_id': 'P3', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find invoice_id=INV-999'}]

Explanation: The memo contains an invoice id token, but that invoice does not exist.

Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P4', 'amount': 10, 'memo': 'Invoice INV-1'}])

Expected Output: [{'payment_id': 'P4', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing memo marker'}]

Explanation: This payment does not contain the required marker.

Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [{'payment_id': 'P5', 'amount': 10, 'memo': 'Paying off: '}])

Expected Output: [{'payment_id': 'P5', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing invoice id in memo'}]

Explanation: Edge case: the marker exists, but there is no token after it.

Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [])

Expected Output: []

Explanation: Edge case: no payments means no results.

Solution

def solution(invoices, payments):
    invoice_by_id = {}
    for invoice in invoices:
        invoice_by_id[invoice['invoice_id']] = invoice

    marker = 'Paying off:'
    results = []

    for payment in payments:
        memo = payment.get('memo', '')
        idx = memo.find(marker)

        if idx == -1:
            results.append({
                'payment_id': payment['payment_id'],
                'matched_invoice_id': None,
                'match_mode': 'unmatched',
                'error_message': 'missing memo marker'
            })
            continue

        rest = memo[idx + len(marker):].strip()
        if not rest:
            results.append({
                'payment_id': payment['payment_id'],
                'matched_invoice_id': None,
                'match_mode': 'unmatched',
                'error_message': 'missing invoice id in memo'
            })
            continue

        invoice_id = rest.split()[0]
        if invoice_id in invoice_by_id:
            results.append({
                'payment_id': payment['payment_id'],
                'matched_invoice_id': invoice_id,
                'match_mode': 'id'
            })
        else:
            results.append({
                'payment_id': payment['payment_id'],
                'matched_invoice_id': None,
                'match_mode': 'unmatched',
                'error_message': f'cannot find invoice_id={invoice_id}'
            })

    return results

Time complexity: O(n + m), where n is the number of invoices and m is the number of payments. Space complexity: O(n).

Hints

  1. Build a hash map from invoice_id to invoice so lookup is O(1) per payment.
  2. After finding the marker, slice the remainder of the memo, strip spaces, and take the first whitespace-separated token.

Part 2: Match Payments by Invoice ID or by Amount

Given a list of invoices and a list of payments, process each payment in order. If a payment memo contains the marker Paying off:, use ID-based matching: parse the invoice id token immediately after the marker, trim spaces, and match that exact invoice id. If the marker is present but no invoice id follows it, return an unmatched result. If the marker is not present, use amount-based matching: find invoices whose amount equals the payment amount and choose the one with the earliest due_date. If multiple invoices still tie on due_date, choose the lexicographically smallest invoice_id to keep the result deterministic. If no match exists, return an unmatched result.

Constraints

  • 0 <= len(invoices), len(payments) <= 200000
  • Each invoice_id is unique and each payment_id is unique
  • due_date is in ISO format YYYY-MM-DD, so lexicographic string comparison matches chronological order
  • Amounts can repeat across invoices; for amount-based matching choose earliest due_date, then lexicographically smallest invoice_id if still tied

Examples

Input: ([{'invoice_id': 'INV-1', 'amount': 100, 'due_date': '2024-06-15'}, {'invoice_id': 'INV-2', 'amount': 100, 'due_date': '2024-05-01'}, {'invoice_id': 'INV-3', 'amount': 50, 'due_date': '2024-07-01'}], [{'payment_id': 'P1', 'amount': 50, 'memo': 'Paying off: INV-3 extra'}, {'payment_id': 'P2', 'amount': 100, 'memo': 'Bank transfer'}, {'payment_id': 'P3', 'amount': 75, 'memo': 'Unknown'}])

Expected Output: [{'payment_id': 'P1', 'matched_invoice_id': 'INV-3', 'match_mode': 'id'}, {'payment_id': 'P2', 'matched_invoice_id': 'INV-2', 'match_mode': 'amount'}, {'payment_id': 'P3', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find matching invoice for amount=75'}]

Explanation: P1 uses memo-based matching. P2 has no marker, so it matches by amount to the earliest due date among the 100-amount invoices. P3 has no amount match.

Input: ([{'invoice_id': 'INV-10', 'amount': 200, 'due_date': '2024-03-01'}], [{'payment_id': 'P4', 'amount': 200, 'memo': 'Paying off: INV-404'}])

Expected Output: [{'payment_id': 'P4', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'cannot find invoice_id=INV-404'}]

Explanation: Because the marker exists, the payment must use ID-based logic and does not fall back to amount-based matching.

Input: ([{'invoice_id': 'INV-20', 'amount': 30, 'due_date': '2024-04-01'}], [{'payment_id': 'P5', 'amount': 30, 'memo': 'Paying off: '}])

Expected Output: [{'payment_id': 'P5', 'matched_invoice_id': None, 'match_mode': 'unmatched', 'error_message': 'missing invoice id in memo'}]

Explanation: Edge case: the marker exists but no invoice id token follows it, so the payment is unmatched.

Input: ([{'invoice_id': 'A-2', 'amount': 60, 'due_date': '2024-01-01'}, {'invoice_id': 'A-1', 'amount': 60, 'due_date': '2024-01-01'}], [{'payment_id': 'P6', 'amount': 60, 'memo': 'wire'}])

Expected Output: [{'payment_id': 'P6', 'matched_invoice_id': 'A-1', 'match_mode': 'amount'}]

Explanation: Both invoices match by amount and due date, so the lexicographically smaller invoice_id is chosen.

Input: ([{'invoice_id': 'INV-1', 'amount': 10, 'due_date': '2024-02-01'}], [])

Expected Output: []

Explanation: Edge case: no payments means the result list is empty.

Solution

def solution(invoices, payments):
    invoice_by_id = {}
    best_by_amount = {}

    for invoice in invoices:
        invoice_id = invoice['invoice_id']
        amount = invoice['amount']
        due_date = invoice['due_date']
        invoice_by_id[invoice_id] = invoice

        current = best_by_amount.get(amount)
        if current is None:
            best_by_amount[amount] = invoice
        else:
            current_key = (current['due_date'], current['invoice_id'])
            candidate_key = (due_date, invoice_id)
            if candidate_key < current_key:
                best_by_amount[amount] = invoice

    marker = 'Paying off:'
    results = []

    for payment in payments:
        memo = payment.get('memo', '')
        idx = memo.find(marker)

        if idx != -1:
            rest = memo[idx + len(marker):].strip()
            if not rest:
                results.append({
                    'payment_id': payment['payment_id'],
                    'matched_invoice_id': None,
                    'match_mode': 'unmatched',
                    'error_message': 'missing invoice id in memo'
                })
                continue

            invoice_id = rest.split()[0]
            if invoice_id in invoice_by_id:
                results.append({
                    'payment_id': payment['payment_id'],
                    'matched_invoice_id': invoice_id,
                    'match_mode': 'id'
                })
            else:
                results.append({
                    'payment_id': payment['payment_id'],
                    'matched_invoice_id': None,
                    'match_mode': 'unmatched',
                    'error_message': f'cannot find invoice_id={invoice_id}'
                })
        else:
            amount = payment['amount']
            match = best_by_amount.get(amount)
            if match is None:
                results.append({
                    'payment_id': payment['payment_id'],
                    'matched_invoice_id': None,
                    'match_mode': 'unmatched',
                    'error_message': f'cannot find matching invoice for amount={amount}'
                })
            else:
                results.append({
                    'payment_id': payment['payment_id'],
                    'matched_invoice_id': match['invoice_id'],
                    'match_mode': 'amount'
                })

    return results

Time complexity: O(n + m), where n is the number of invoices and m is the number of payments. Space complexity: O(n).

Hints

  1. Precompute two structures: invoice_id -> invoice and amount -> best invoice for that amount.
  2. If the marker Paying off: exists, do not fall back to amount-based matching even if the parsed invoice id is missing or invalid.
Last updated: May 5, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Compute Turnstile Crossing Times - Google (hard)
  • Simulate In-Place Cellular State Updates - Google (hard)
  • Determine Whether a Word Exists in a Graph - Google (medium)
  • Solve Shortest Paths and Rental Allocation - Google (medium)
  • Solve Two Array Optimization Problems - Google (medium)