PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Nubank

Design a chargeback ingestion and export system

Last updated: Jun 17, 2026

Quick Overview

This question evaluates system design skills around reliable ingestion and scheduled export of financial transaction records, specifically testing competencies in data modeling, batching strategies, idempotency and retry semantics, durable storage, error handling, monitoring, and security/compliance for payment and PII data.

  • easy
  • Nubank
  • System Design
  • Software Engineer

Design a chargeback ingestion and export system

Company: Nubank

Role: Software Engineer

Category: System Design

Difficulty: easy

Interview Round: Onsite

Design a **chargeback ingestion and export system** for a bank. The system receives chargeback-related transaction records from internal payment and dispute systems, validates and durably stores them, then **generates a CSV file and transfers it to Mastercard over FTP four times per day**. Walk through the architecture, the data model, the batching strategy, the reliability and idempotency guarantees, error handling, monitoring, and the security considerations specific to handling payment/cardholder data. ```hint Where to start Decompose the system into two loosely-coupled halves connected by durable storage: a **continuous ingestion path** (event in → validate → store) and a **scheduled export path** (batch cutoff → CSV → FTP). Keeping them separate is what lets you retry, audit, and reconcile without losing or double-sending records. ``` ```hint Idempotency Duplicates can leak in at two independent points: **ingestion** (the source re-sends the same event) and **export** (a retry or a scheduler race lets one record reach Mastercard in two different files). For each point, ask: what property of the record or the batch-claim step would make a re-send a no-op rather than a second copy? Don't let the two be solved by the same mechanism — they fail differently. ``` ```hint The FTP / "exactly-once to partner" pitfall Work through the nastiest partial-failure ordering: the bytes land on the partner's server, but the line of code that records "uploaded" never runs (crash, timeout). On the next run, how does the transfer worker *tell the difference* between "never sent" and "already sent"? And separately: while a file is mid-upload, what stops the partner from reading a half-written file? Both questions point at how you name and finalize the remote file. ``` ### Constraints & Assumptions - **Network:** Mastercard accepts files over FTP; in practice you should push for **SFTP/FTPS**. Assume the partner dictates the file schema, header/trailer format, and naming convention. - **Cadence:** four scheduled exports per day (e.g. windows at 00:00 / 06:00 / 12:00 / 18:00), each with a hard cutoff so a record either makes the current batch or the next one. - **Volume:** assume moderate-to-high but bounded daily chargeback volume — a relational store with strong transactional guarantees is acceptable; call out where you'd partition or scale out. - **Correctness bar:** **no lost chargeback records** and **no duplicate exports** are the dominant requirements; some end-to-end latency (up to the next batch window) is acceptable. - **Compliance:** the records are financial data tied to payment disputes (transaction and merchant identifiers, amounts) and may reference card / PII fields depending on the source schema, so the system sits in a regulated, audit-heavy environment. Part of the design is deciding what cardholder data, if any, the records actually carry and keeping anything sensitive out of logs and exports — treat any PAN-class field as PCI-scoped if present. ### Clarifying Questions to Ask - What exact CSV schema, encoding, and header/trailer/control-record format does Mastercard require, and is there a per-file or per-record count limit? - Does Mastercard send back **acknowledgment / reject files**, and through what channel — so we can reconcile accepted vs. rejected records? - Is plain FTP truly mandated, or are SFTP/FTPS available? Are there fixed delivery-window SLAs we'd be penalized for missing? - What is the source of truth for "this chargeback already exists" — can the same dispute legitimately produce multiple records? - What are the data-retention and audit requirements for raw records and generated files? - What daily/peak record volume should the design target, and is there a maximum acceptable end-to-end delay from ingestion to export? ### What a Strong Answer Covers - **Two-phase decomposition:** ingestion path vs. scheduled export path, connected by durable storage — and why that separation matters. - **Data model:** record entity with an idempotency key and a clear status lifecycle, plus a batch entity and a record-to-batch mapping for reconciliation. - **Idempotency at both ends:** unique source key + DB constraint on ingest; transactional, exclusive batch assignment on export. - **Batching with a clear cutoff** and a race-free "claim records → assign to batch → flip status" transaction. - **CSV generation:** versioned schema, deterministic ordering, checksum, immutable file stored in object storage before transfer. - **Transfer reliability:** retry with backoff, no file regeneration on retry, atomic remote rename, missed-SLA alerting, and the SFTP/FTPS-over-FTP security stance. - **Failure analysis** of the partial-failure windows (CSV ok / FTP fails; upload ok / status write fails) and how each is made safe. - **Reconciliation** against partner ack files and a status update loop for rejects. - **Observability:** the specific metrics and alerts (backlog, missed window, upload failure, validation-spike, reconciliation mismatch). - **Security/compliance:** encryption in transit and at rest, secrets management, least privilege, masking/no-logging of cardholder data, retention. ### Follow-up Questions - A scheduled batch's FTP upload fails all retries and misses its window. Exactly what state is the batch in, what gets alerted, and how does the next run avoid double-sending or skipping those records? - The upload to Mastercard succeeds but your process crashes before recording success. On the next run, how do you determine the file already arrived and avoid the partner processing it twice? - Mastercard's ack file reports that 3% of a batch was rejected for a bad reason code. Walk through reconciliation: how do those records get re-validated, re-queued, and exported without re-sending the accepted 97%? - Daily volume grows 10x. Which components become bottlenecks first, and how do you scale ingestion, validation, CSV generation, and transfer independently?

Quick Answer: This question evaluates system design skills around reliable ingestion and scheduled export of financial transaction records, specifically testing competencies in data modeling, batching strategies, idempotency and retry semantics, durable storage, error handling, monitoring, and security/compliance for payment and PII data.

Nubank logo
Nubank
Jun 4, 2026, 12:00 AM
Software Engineer
Onsite
System Design
5
0

Design a chargeback ingestion and export system for a bank.

The system receives chargeback-related transaction records from internal payment and dispute systems, validates and durably stores them, then generates a CSV file and transfers it to Mastercard over FTP four times per day. Walk through the architecture, the data model, the batching strategy, the reliability and idempotency guarantees, error handling, monitoring, and the security considerations specific to handling payment/cardholder data.

Constraints & Assumptions

  • Network: Mastercard accepts files over FTP; in practice you should push for SFTP/FTPS . Assume the partner dictates the file schema, header/trailer format, and naming convention.
  • Cadence: four scheduled exports per day (e.g. windows at 00:00 / 06:00 / 12:00 / 18:00), each with a hard cutoff so a record either makes the current batch or the next one.
  • Volume: assume moderate-to-high but bounded daily chargeback volume — a relational store with strong transactional guarantees is acceptable; call out where you'd partition or scale out.
  • Correctness bar: no lost chargeback records and no duplicate exports are the dominant requirements; some end-to-end latency (up to the next batch window) is acceptable.
  • Compliance: the records are financial data tied to payment disputes (transaction and merchant identifiers, amounts) and may reference card / PII fields depending on the source schema, so the system sits in a regulated, audit-heavy environment. Part of the design is deciding what cardholder data, if any, the records actually carry and keeping anything sensitive out of logs and exports — treat any PAN-class field as PCI-scoped if present.

Clarifying Questions to Ask

  • What exact CSV schema, encoding, and header/trailer/control-record format does Mastercard require, and is there a per-file or per-record count limit?
  • Does Mastercard send back acknowledgment / reject files , and through what channel — so we can reconcile accepted vs. rejected records?
  • Is plain FTP truly mandated, or are SFTP/FTPS available? Are there fixed delivery-window SLAs we'd be penalized for missing?
  • What is the source of truth for "this chargeback already exists" — can the same dispute legitimately produce multiple records?
  • What are the data-retention and audit requirements for raw records and generated files?
  • What daily/peak record volume should the design target, and is there a maximum acceptable end-to-end delay from ingestion to export?

What a Strong Answer Covers

  • Two-phase decomposition: ingestion path vs. scheduled export path, connected by durable storage — and why that separation matters.
  • Data model: record entity with an idempotency key and a clear status lifecycle, plus a batch entity and a record-to-batch mapping for reconciliation.
  • Idempotency at both ends: unique source key + DB constraint on ingest; transactional, exclusive batch assignment on export.
  • Batching with a clear cutoff and a race-free "claim records → assign to batch → flip status" transaction.
  • CSV generation: versioned schema, deterministic ordering, checksum, immutable file stored in object storage before transfer.
  • Transfer reliability: retry with backoff, no file regeneration on retry, atomic remote rename, missed-SLA alerting, and the SFTP/FTPS-over-FTP security stance.
  • Failure analysis of the partial-failure windows (CSV ok / FTP fails; upload ok / status write fails) and how each is made safe.
  • Reconciliation against partner ack files and a status update loop for rejects.
  • Observability: the specific metrics and alerts (backlog, missed window, upload failure, validation-spike, reconciliation mismatch).
  • Security/compliance: encryption in transit and at rest, secrets management, least privilege, masking/no-logging of cardholder data, retention.

Follow-up Questions

  • A scheduled batch's FTP upload fails all retries and misses its window. Exactly what state is the batch in, what gets alerted, and how does the next run avoid double-sending or skipping those records?
  • The upload to Mastercard succeeds but your process crashes before recording success. On the next run, how do you determine the file already arrived and avoid the partner processing it twice?
  • Mastercard's ack file reports that 3% of a batch was rejected for a bad reason code. Walk through reconciliation: how do those records get re-validated, re-queued, and exported without re-sending the accepted 97%?
  • Daily volume grows 10x. Which components become bottlenecks first, and how do you scale ingestion, validation, CSV generation, and transfer independently?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Nubank•More Software Engineer•Nubank Software Engineer•Nubank System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.