PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Data Manipulation (SQL/Python)/Capital One

Reconcile ledgers with SQL/Python and late events

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to implement data reconciliation and ETL engineering skills, including SQL window functions and joins, time-based FX rate lookups, matching logic for late-arriving events, idempotent daily pipeline design with deterministic keys and exactly-once semantics, and basic statistical charting in Python.

  • Medium
  • Capital One
  • Data Manipulation (SQL/Python)
  • Data Scientist

Reconcile ledgers with SQL/Python and late events

Company: Capital One

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You own a daily ETL + reconciliation job between two financial ledgers. Late postings (“delay time”) up to 48 hours are common. Schema: - payments_raw(payment_id INT, user_id INT, amount DECIMAL(10,2), currency CHAR(3), created_at TIMESTAMP) - ledger_a(txn_id STRING, user_id INT, amount_usd DECIMAL(10,2), posted_at TIMESTAMP) - ledger_b(txn_id STRING, user_id INT, amount_usd DECIMAL(10,2), posted_at TIMESTAMP) - fx_rates(rate_date DATE, currency CHAR(3), usd_rate DECIMAL(10,6)) -- USD per 1 currency unit Small ASCII samples: payments_raw payment_id | user_id | amount | currency | created_at 1 | 101 | 100.00 | USD | 2025-08-30 10:05:00 2 | 102 | 90.00 | EUR | 2025-08-30 16:40:00 3 | 101 | 50.00 | USD | 2025-08-31 23:30:00 4 | 103 | 100.00 | EUR | 2025-09-01 01:20:00 fx_rates rate_date | currency | usd_rate 2025-08-30 | USD | 1.000000 2025-08-30 | EUR | 1.100000 2025-08-31 | USD | 1.000000 2025-08-31 | EUR | 1.120000 2025-09-01 | USD | 1.000000 2025-09-01 | EUR | 1.150000 ledger_a txn_id | user_id | amount_usd | posted_at A-1 | 101 | 100.00 | 2025-08-30 10:06:00 A-2 | 102 | 99.00 | 2025-08-30 18:00:00 A-3 | 103 | 115.00 | 2025-09-01 08:00:00 ledger_b txn_id | user_id | amount_usd | posted_at B-1 | 101 | 100.00 | 2025-09-02 11:00:00 B-2 | 101 | 50.00 | 2025-09-01 00:10:00 B-3 | 102 | 99.01 | 2025-08-30 18:02:00 Tasks: 1) Write a single SQL query that produces a reconciliation report at the payment level with columns: payment_id, user_id, amount_usd_from_fx, a_txn_id, b_txn_id, match_status (one of: matched, late_within_48h, late_beyond_48h, amount_mismatch, missing_in_a, missing_in_b), late_by_hours (NULL if not late), and abs_amount_diff. Rules: (a) Convert payments_raw.amount to USD using the fx rate effective on created_at’s date (use the most recent rate on or before created_at). (b) Attempt to match each payment to at most one txn from each ledger on same user_id, |amount_usd_from_fx − amount_usd| ≤ 0.01, and |posted_at − created_at| ≤ 48 hours for a normal match; if > 48 hours but otherwise matching, classify as late_beyond_48h and compute late_by_hours; if amounts within 0.01 in only one ledger, set the other as missing_in_*. If both ledgers match but amounts differ by > 0.01, classify amount_mismatch. Break ties by the smallest absolute time difference to created_at using window functions. 2) In Python, outline an idempotent daily ETL design that: (a) partitions by event_date (created_at::date), (b) maintains a rolling 72-hour backfill to catch late arrivals, (c) writes a reconciliation snapshot table with a deterministic primary key, and (d) guarantees exactly-once downstream effects. Mention how you would detect and merge late_beyond_48h fixes without duplicating rows. 3) Generate a discrepancy distribution chart. First, produce a pie chart of counts by match_status using matplotlib (or plotly). Then justify why a bar chart with percentages and 95% CIs is superior here and sketch how you’d produce it. Name the exact Python packages you’d use. 4) Defend Python vs R for this pipeline. Be specific about libraries (e.g., pandas/pyarrow/duckdb vs data.table/arrow/dbplyr), deployment/runtime concerns, and interoperability with the rest of a modern data stack.

Quick Answer: This question evaluates a candidate's ability to implement data reconciliation and ETL engineering skills, including SQL window functions and joins, time-based FX rate lookups, matching logic for late-arriving events, idempotent daily pipeline design with deterministic keys and exactly-once semantics, and basic statistical charting in Python.

Related Interview Questions

  • Clean and Merge Housing Data - Capital One (easy)
  • Find Lowest Prices for Highly Rated Categories - Capital One (medium)
  • Write SQL to compute campaign net revenue - Capital One (Medium)
  • Merge CSVs and build revenue pivot with pandas - Capital One (Medium)
  • Find top category per region in Aug 2025 - Capital One (Medium)
Capital One logo
Capital One
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
1
0

You own a daily ETL + reconciliation job between two financial ledgers. Late postings (“delay time”) up to 48 hours are common. Schema:

  • payments_raw(payment_id INT, user_id INT, amount DECIMAL(10,2), currency CHAR(3), created_at TIMESTAMP)
  • ledger_a(txn_id STRING, user_id INT, amount_usd DECIMAL(10,2), posted_at TIMESTAMP)
  • ledger_b(txn_id STRING, user_id INT, amount_usd DECIMAL(10,2), posted_at TIMESTAMP)
  • fx_rates(rate_date DATE, currency CHAR(3), usd_rate DECIMAL(10,6)) -- USD per 1 currency unit Small ASCII samples: payments_raw payment_id | user_id | amount | currency | created_at 1 | 101 | 100.00 | USD | 2025-08-30 10:05:00 2 | 102 | 90.00 | EUR | 2025-08-30 16:40:00 3 | 101 | 50.00 | USD | 2025-08-31 23:30:00 4 | 103 | 100.00 | EUR | 2025-09-01 01:20:00

fx_rates rate_date | currency | usd_rate 2025-08-30 | USD | 1.000000 2025-08-30 | EUR | 1.100000 2025-08-31 | USD | 1.000000 2025-08-31 | EUR | 1.120000 2025-09-01 | USD | 1.000000 2025-09-01 | EUR | 1.150000

ledger_a txn_id | user_id | amount_usd | posted_at A-1 | 101 | 100.00 | 2025-08-30 10:06:00 A-2 | 102 | 99.00 | 2025-08-30 18:00:00 A-3 | 103 | 115.00 | 2025-09-01 08:00:00

ledger_b txn_id | user_id | amount_usd | posted_at B-1 | 101 | 100.00 | 2025-09-02 11:00:00 B-2 | 101 | 50.00 | 2025-09-01 00:10:00 B-3 | 102 | 99.01 | 2025-08-30 18:02:00 Tasks:

  1. Write a single SQL query that produces a reconciliation report at the payment level with columns: payment_id, user_id, amount_usd_from_fx, a_txn_id, b_txn_id, match_status (one of: matched, late_within_48h, late_beyond_48h, amount_mismatch, missing_in_a, missing_in_b), late_by_hours (NULL if not late), and abs_amount_diff. Rules: (a) Convert payments_raw.amount to USD using the fx rate effective on created_at’s date (use the most recent rate on or before created_at). (b) Attempt to match each payment to at most one txn from each ledger on same user_id, |amount_usd_from_fx − amount_usd| ≤ 0.01, and |posted_at − created_at| ≤ 48 hours for a normal match; if > 48 hours but otherwise matching, classify as late_beyond_48h and compute late_by_hours; if amounts within 0.01 in only one ledger, set the other as missing_in_*. If both ledgers match but amounts differ by > 0.01, classify amount_mismatch. Break ties by the smallest absolute time difference to created_at using window functions.
  2. In Python, outline an idempotent daily ETL design that: (a) partitions by event_date (created_at::date), (b) maintains a rolling 72-hour backfill to catch late arrivals, (c) writes a reconciliation snapshot table with a deterministic primary key, and (d) guarantees exactly-once downstream effects. Mention how you would detect and merge late_beyond_48h fixes without duplicating rows.
  3. Generate a discrepancy distribution chart. First, produce a pie chart of counts by match_status using matplotlib (or plotly). Then justify why a bar chart with percentages and 95% CIs is superior here and sketch how you’d produce it. Name the exact Python packages you’d use.
  4. Defend Python vs R for this pipeline. Be specific about libraries (e.g., pandas/pyarrow/duckdb vs data.table/arrow/dbplyr), deployment/runtime concerns, and interoperability with the rest of a modern data stack.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Capital One•More Data Scientist•Capital One Data Scientist•Capital One Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.