PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Data Manipulation (SQL/Python)/Gemini

Write SQL/Python for ACH fraud analytics

Last updated: Mar 29, 2026

Quick Overview

This question evaluates time-windowed analytics, SQL window functions and joins, ranking and event correlation, streaming de-duplication in Python, and domain knowledge of ACH payment and return behaviors within the Data Manipulation (SQL/Python) category.

  • Medium
  • Gemini
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write SQL/Python for ACH fraud analytics

Company: Gemini

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

As of today (2025-09-01), use the following schema and tiny samples to answer. Provide SQL for the SQL parts and Python for the Python part. Schema - users(user_id INT, created_at TIMESTAMP, country TEXT) - transactions(txn_id TEXT, user_id INT, direction TEXT, method TEXT, amount_cents INT, created_at TIMESTAMP, status TEXT) - ach_returns(txn_id TEXT, return_code TEXT, returned_at TIMESTAMP) - devices(device_id TEXT, user_id INT, fingerprint TEXT, first_seen_at TIMESTAMP) - logins(login_id TEXT, user_id INT, device_id TEXT, ip TEXT, ts TIMESTAMP) Sample rows users user_id | created_at | country 1 | 2025-06-10 08:00:00 | US 2 | 2025-08-01 12:00:00 | US 3 | 2025-08-20 09:00:00 | CA transactions txn_id | user_id | direction | method | amount_cents | created_at | status t1 | 1 | credit | ACH | 50000 | 2025-08-31 10:00:00 | posted t2 | 1 | credit | ACH | 40000 | 2025-08-31 18:00:00 | posted t3 | 1 | credit | ACH | 30000 | 2025-09-01 09:30:00 | posted t4 | 1 | debit | ACH | 20000 | 2025-09-02 12:00:00 | posted t5 | 2 | credit | ACH | 150000 | 2025-08-30 02:00:00 | posted t6 | 2 | credit | ACH | 150000 | 2025-09-01 02:30:00 | posted t7 | 3 | credit | CARD | 70000 | 2025-09-01 11:00:00 | posted ach_returns txn_id | return_code | returned_at t2 | R10 | 2025-09-03 08:00:00 t5 | R01 | 2025-09-05 09:00:00 devices device_id | user_id | fingerprint | first_seen_at d1 | 1 | abc123 | 2025-08-01 09:00:00 d2 | 1 | abc124 | 2025-08-31 17:30:00 d3 | 2 | zyx999 | 2025-08-29 22:00:00 d4 | 3 | abc123 | 2025-09-01 10:45:00 logins login_id | user_id | device_id | ip | ts l1 | 1 | d1 | 1.1.1.1 | 2025-08-31 09:50:00 l2 | 1 | d2 | 1.1.1.1 | 2025-08-31 17:40:00 l3 | 1 | d1 | 2.2.2.2 | 2025-09-01 09:25:00 l4 | 2 | d3 | 3.3.3.3 | 2025-09-01 02:25:00 l5 | 3 | d4 | 4.4.4.4 | 2025-09-01 10:50:00 Tasks A) SQL (window + joins): For each user, find the earliest rolling 24h window that ends on or before 2025-09-01 23:59:59 in which they have at least 3 ACH credit transactions and at least one of those credits is returned within 5 days of its created_at. Output: user_id, window_start, window_end, num_ach_credits_in_window, num_returns_within_5d, net_exposure_cents. Define net_exposure_cents as sum(amount_cents) of the ACH credits in that 24h window minus sum(amount_cents) of any ACH debits by the same user occurring between window_start and the earliest return timestamp for those credits. Use only transactions and ach_returns. B) SQL (ranking + window): As of 2025-09-01, for each user, rank their devices by the count of returned ACH credits linked to that device in the last 30 days (i.e., credits within 2025-08-02…2025-09-01 whose txn_id appears in ach_returns with returned_at in the same interval, where the device is the most recent login within 60 minutes before the credit). Output the top device per user with columns: user_id, device_id, fingerprint, returned_count_30d. C) Python: Given logins (streaming, possibly out of order by ≤5 minutes), collapse device fingerprints that differ by exactly one character (case-sensitive) into a canonical fingerprint (e.g., union-find over Hamming distance 1). Return all clusters where the canonical fingerprint is used by ≥3 distinct users within any 7-day window ending on 2025-09-01. Output list of tuples (canonical_fingerprint, window_start, window_end, distinct_user_count). State time/space complexity and how you handle late events and ties.

Quick Answer: This question evaluates time-windowed analytics, SQL window functions and joins, ranking and event correlation, streaming de-duplication in Python, and domain knowledge of ACH payment and return behaviors within the Data Manipulation (SQL/Python) category.

Gemini logo
Gemini
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
2
0

As of today (2025-09-01), use the following schema and tiny samples to answer. Provide SQL for the SQL parts and Python for the Python part.

Schema

  • users(user_id INT, created_at TIMESTAMP, country TEXT)
  • transactions(txn_id TEXT, user_id INT, direction TEXT, method TEXT, amount_cents INT, created_at TIMESTAMP, status TEXT)
  • ach_returns(txn_id TEXT, return_code TEXT, returned_at TIMESTAMP)
  • devices(device_id TEXT, user_id INT, fingerprint TEXT, first_seen_at TIMESTAMP)
  • logins(login_id TEXT, user_id INT, device_id TEXT, ip TEXT, ts TIMESTAMP)

Sample rows users user_id | created_at | country 1 | 2025-06-10 08:00:00 | US 2 | 2025-08-01 12:00:00 | US 3 | 2025-08-20 09:00:00 | CA

transactions txn_id | user_id | direction | method | amount_cents | created_at | status t1 | 1 | credit | ACH | 50000 | 2025-08-31 10:00:00 | posted t2 | 1 | credit | ACH | 40000 | 2025-08-31 18:00:00 | posted t3 | 1 | credit | ACH | 30000 | 2025-09-01 09:30:00 | posted t4 | 1 | debit | ACH | 20000 | 2025-09-02 12:00:00 | posted t5 | 2 | credit | ACH | 150000 | 2025-08-30 02:00:00 | posted t6 | 2 | credit | ACH | 150000 | 2025-09-01 02:30:00 | posted t7 | 3 | credit | CARD | 70000 | 2025-09-01 11:00:00 | posted

ach_returns txn_id | return_code | returned_at t2 | R10 | 2025-09-03 08:00:00 t5 | R01 | 2025-09-05 09:00:00

devices device_id | user_id | fingerprint | first_seen_at d1 | 1 | abc123 | 2025-08-01 09:00:00 d2 | 1 | abc124 | 2025-08-31 17:30:00 d3 | 2 | zyx999 | 2025-08-29 22:00:00 d4 | 3 | abc123 | 2025-09-01 10:45:00

logins login_id | user_id | device_id | ip | ts l1 | 1 | d1 | 1.1.1.1 | 2025-08-31 09:50:00 l2 | 1 | d2 | 1.1.1.1 | 2025-08-31 17:40:00 l3 | 1 | d1 | 2.2.2.2 | 2025-09-01 09:25:00 l4 | 2 | d3 | 3.3.3.3 | 2025-09-01 02:25:00 l5 | 3 | d4 | 4.4.4.4 | 2025-09-01 10:50:00

Tasks A) SQL (window + joins): For each user, find the earliest rolling 24h window that ends on or before 2025-09-01 23:59:59 in which they have at least 3 ACH credit transactions and at least one of those credits is returned within 5 days of its created_at. Output: user_id, window_start, window_end, num_ach_credits_in_window, num_returns_within_5d, net_exposure_cents. Define net_exposure_cents as sum(amount_cents) of the ACH credits in that 24h window minus sum(amount_cents) of any ACH debits by the same user occurring between window_start and the earliest return timestamp for those credits. Use only transactions and ach_returns. B) SQL (ranking + window): As of 2025-09-01, for each user, rank their devices by the count of returned ACH credits linked to that device in the last 30 days (i.e., credits within 2025-08-02…2025-09-01 whose txn_id appears in ach_returns with returned_at in the same interval, where the device is the most recent login within 60 minutes before the credit). Output the top device per user with columns: user_id, device_id, fingerprint, returned_count_30d. C) Python: Given logins (streaming, possibly out of order by ≤5 minutes), collapse device fingerprints that differ by exactly one character (case-sensitive) into a canonical fingerprint (e.g., union-find over Hamming distance 1). Return all clusters where the canonical fingerprint is used by ≥3 distinct users within any 7-day window ending on 2025-09-01. Output list of tuples (canonical_fingerprint, window_start, window_end, distinct_user_count). State time/space complexity and how you handle late events and ties.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Gemini•More Data Scientist•Gemini Data Scientist•Gemini Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.