Detect recurring transactions from a ledger
Company: Ramp
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in detecting temporal recurrence patterns in transaction data, including time-series analysis, date arithmetic, grouping by merchant and currency, and tolerance handling for amounts and date jitter.
Constraints
- 0 <= len(transactions) <= 100000
- Each transaction_date is a valid date string in YYYY-MM-DD format
- Transactions are considered part of the same group only if merchant, amount, and currency all match exactly
- A group must have a valid streak of at least 3 transactions to be considered recurring
Examples
Input: ([],)
Expected Output: []
Explanation: With no transactions, there are no recurring groups.
Input: ([('2024-01-05', 'Netflix', 30, 'USD'), ('2024-02-05', 'Netflix', 30, 'USD'), ('2024-03-05', 'Netflix', 30, 'USD'), ('2024-01-03', 'Gym', 25, 'USD'), ('2024-01-10', 'Gym', 25, 'USD'), ('2024-01-17', 'Gym', 25, 'USD'), ('2024-01-02', 'Bus', 2.75, 'USD'), ('2024-01-04', 'Bus', 2.75, 'USD'), ('2024-01-11', 'Bus', 3.0, 'USD')],)
Expected Output: [('Gym', 25, 'USD', 'WEEKLY'), ('Netflix', 30, 'USD', 'MONTHLY')]
Explanation: Netflix appears on the same day in three consecutive months, and Gym repeats every 7 days for three transactions. Bus charges do not form a qualifying recurring streak.
Input: ([('2024-01-31', 'Rent', 1200, 'USD'), ('2024-02-29', 'Rent', 1200, 'USD'), ('2024-03-31', 'Rent', 1200, 'USD'), ('2024-04-15', 'Rent', 1200, 'USD')],)
Expected Output: [('Rent', 1200, 'USD', 'MONTHLY')]
Explanation: Jan 31, Feb 29, and Mar 31 are all month-end dates in consecutive months, so they form a monthly streak of length 3. The April 15 transaction does not extend the streak.
Input: ([('2024-06-01', 'StreakApp', 1, 'USD'), ('2024-06-02', 'StreakApp', 1, 'USD'), ('2024-06-03', 'StreakApp', 1, 'USD'), ('2024-06-05', 'StreakApp', 1, 'USD')],)
Expected Output: [('StreakApp', 1, 'USD', 'DAILY')]
Explanation: June 1, 2, and 3 form a daily streak of length 3. The June 5 transaction breaks the streak but does not remove the earlier recurring pattern.
Input: ([('2024-01-01', 'Magazine', 10, 'USD'), ('2024-02-01', 'Magazine', 10, 'USD')],)
Expected Output: []
Explanation: Only two matching transactions exist, which is below the minimum of 3 needed for a recurring pattern.
Solution
def solution(transactions):
from collections import defaultdict
from datetime import datetime
import calendar
def parse_date(s):
return datetime.strptime(s, '%Y-%m-%d').date()
def is_last_day(d):
return d.day == calendar.monthrange(d.year, d.month)[1]
def is_monthly_step(a, b):
next_year = a.year + (1 if a.month == 12 else 0)
next_month = 1 if a.month == 12 else a.month + 1
if (b.year, b.month) != (next_year, next_month):
return False
return a.day == b.day or (is_last_day(a) and is_last_day(b))
def longest_streak(dates, kind):
if not dates:
return 0
best = 1
cur = 1
for i in range(1, len(dates)):
prev = dates[i - 1]
curr = dates[i]
if kind == 'DAILY':
ok = (curr - prev).days == 1
elif kind == 'WEEKLY':
ok = (curr - prev).days == 7
else:
ok = is_monthly_step(prev, curr)
if ok:
cur += 1
else:
cur = 1
if cur > best:
best = cur
return best
groups = defaultdict(list)
for date_str, merchant, amount, currency in transactions:
groups[(merchant, amount, currency)].append(parse_date(date_str))
order = ['DAILY', 'WEEKLY', 'MONTHLY']
result = []
for (merchant, amount, currency), dates in groups.items():
dates.sort()
best_kind = None
best_len = 0
for kind in order:
streak = longest_streak(dates, kind)
if streak >= 3 and streak > best_len:
best_len = streak
best_kind = kind
if best_kind is not None:
result.append((merchant, amount, currency, best_kind))
result.sort(key=lambda x: (x[0], x[1], x[2]))
return resultTime complexity: O(n log n). Space complexity: O(n).
Hints
- First group transactions by merchant, amount, and currency. Then you only need to analyze dates within each group.
- After sorting dates in a group, look for the longest consecutive streak for each frequency type by checking adjacent date gaps.