Detect recurring transactions from a ledger
Company: Ramp
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
You are given a list of credit/debit card transactions. Each transaction has:
- `transaction_date` (YYYY-MM-DD)
- `merchant` (string)
- `amount` (decimal, positive)
- `currency` (string, e.g., `USD`)
Many transactions are one-off (e.g., small public transit fares), while some are recurring subscriptions/bills (e.g., Netflix).
## Task
Identify **recurring transactions** and print a summary line per detected recurring pattern, in the form:
- `merchant, <amount> per <period>`
Example output:
- `Netflix, $30 per month`
- `GymCo, $12.99 per week`
## Requirements / Clarifications
1. Recurring transactions should be detected even if there are many non-recurring transactions from the same merchant (e.g., repeated but irregular small charges).
2. Your solution should support multiple recurrence periods, at least:
- daily
- weekly
- monthly
(bonus: yearly)
3. Assume transactions for different currencies should **not** be mixed (i.e., treat `(merchant, currency)` as separate streams unless you explicitly design conversion).
4. Define and justify what “recurring” means (e.g., minimum number of occurrences, acceptable date jitter for monthly billing, and whether amount must match exactly or within a tolerance).
## Input/Output
- **Input:** an array/list of transactions in arbitrary order.
- **Output:** a list of recurring summaries. If multiple recurring patterns exist for the same merchant (e.g., two different subscription tiers), include each separately.
## Constraints (assume)
- Up to 100,000 transactions.
- Dates span up to several years.
## Example
Input (abridged):
- 2024-01-05, Netflix, 30.00, USD
- 2024-02-05, Netflix, 30.00, USD
- 2024-03-05, Netflix, 30.00, USD
- 2024-01-02, CityTransit, 2.75, USD
- 2024-01-09, CityTransit, 2.75, USD
- 2024-01-13, CityTransit, 2.75, USD (may or may not be considered “recurring” depending on your definition)
Expected output includes something like:
- `Netflix, $30.00 per month`
Explain your approach and the time/space complexity.
Quick Answer: This question evaluates skills in detecting temporal recurrence patterns in transaction data, including time-series analysis, date arithmetic, grouping by merchant and currency, and tolerance handling for amounts and date jitter.
You are given a ledger of transactions. Each transaction is a tuple of the form (transaction_date, merchant, amount, currency), where transaction_date is a string in YYYY-MM-DD format.
A transaction group is defined by having the same merchant, amount, and currency.
Your task is to find which groups are recurring. For this problem, a group is considered recurring if, after sorting its dates, it contains a streak of at least 3 transactions that follows one of these exact schedules:
- DAILY: each adjacent transaction in the streak is exactly 1 day apart
- WEEKLY: each adjacent transaction in the streak is exactly 7 days apart
- MONTHLY: each adjacent transaction is in the next calendar month, and either the day of month is the same or both dates are the last day of their month
A streak can be any contiguous block after sorting the dates for that group. Extra off-schedule transactions before or after the streak do not matter.
For each group, return only one frequency:
- choose the frequency with the longest valid streak
- if multiple frequencies have the same longest streak, prefer DAILY over WEEKLY over MONTHLY
Return all detected recurring groups as tuples in the form (merchant, amount, currency, frequency), sorted by merchant, then amount, then currency.
Constraints
- 0 <= len(transactions) <= 100000
- Each transaction_date is a valid date string in YYYY-MM-DD format
- Transactions are considered part of the same group only if merchant, amount, and currency all match exactly
- A group must have a valid streak of at least 3 transactions to be considered recurring
Examples
Input: ([],)
Expected Output: []
Explanation: With no transactions, there are no recurring groups.
Input: ([('2024-01-05', 'Netflix', 30, 'USD'), ('2024-02-05', 'Netflix', 30, 'USD'), ('2024-03-05', 'Netflix', 30, 'USD'), ('2024-01-03', 'Gym', 25, 'USD'), ('2024-01-10', 'Gym', 25, 'USD'), ('2024-01-17', 'Gym', 25, 'USD'), ('2024-01-02', 'Bus', 2.75, 'USD'), ('2024-01-04', 'Bus', 2.75, 'USD'), ('2024-01-11', 'Bus', 3.0, 'USD')],)
Expected Output: [('Gym', 25, 'USD', 'WEEKLY'), ('Netflix', 30, 'USD', 'MONTHLY')]
Explanation: Netflix appears on the same day in three consecutive months, and Gym repeats every 7 days for three transactions. Bus charges do not form a qualifying recurring streak.
Input: ([('2024-01-31', 'Rent', 1200, 'USD'), ('2024-02-29', 'Rent', 1200, 'USD'), ('2024-03-31', 'Rent', 1200, 'USD'), ('2024-04-15', 'Rent', 1200, 'USD')],)
Expected Output: [('Rent', 1200, 'USD', 'MONTHLY')]
Explanation: Jan 31, Feb 29, and Mar 31 are all month-end dates in consecutive months, so they form a monthly streak of length 3. The April 15 transaction does not extend the streak.
Input: ([('2024-06-01', 'StreakApp', 1, 'USD'), ('2024-06-02', 'StreakApp', 1, 'USD'), ('2024-06-03', 'StreakApp', 1, 'USD'), ('2024-06-05', 'StreakApp', 1, 'USD')],)
Expected Output: [('StreakApp', 1, 'USD', 'DAILY')]
Explanation: June 1, 2, and 3 form a daily streak of length 3. The June 5 transaction breaks the streak but does not remove the earlier recurring pattern.
Input: ([('2024-01-01', 'Magazine', 10, 'USD'), ('2024-02-01', 'Magazine', 10, 'USD')],)
Expected Output: []
Explanation: Only two matching transactions exist, which is below the minimum of 3 needed for a recurring pattern.
Hints
- First group transactions by merchant, amount, and currency. Then you only need to analyze dates within each group.
- After sorting dates in a group, look for the longest consecutive streak for each frequency type by checking adjacent date gaps.