How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a easy difficulty System Design question, commonly asked during Onsite rounds at Nubank.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Nubank during technical interviews.

Design a chargeback ingestion and export system

Q: Design a chargeback ingestion and export system

This question evaluates system design skills around reliable ingestion and scheduled export of financial transaction records, specifically testing competencies in data modeling, batching strategies, idempotency and retry semantics, durable storage, error handling, monitoring, and security/compliance for payment and PII data.

Design a chargeback ingestion and export system for a bank.

The system receives chargeback-related transaction records from internal payment and dispute systems, validates and durably stores them, then generates a CSV file and transfers it to Mastercard over FTP four times per day. Walk through the architecture, the data model, the batching strategy, the reliability and idempotency guarantees, error handling, monitoring, and the security considerations specific to handling payment/cardholder data.

Constraints & Assumptions

Network: Mastercard accepts files over FTP; in practice you should push for SFTP/FTPS . Assume the partner dictates the file schema, header/trailer format, and naming convention.
Cadence: four scheduled exports per day (e.g. windows at 00:00 / 06:00 / 12:00 / 18:00), each with a hard cutoff so a record either makes the current batch or the next one.
Volume: assume moderate-to-high but bounded daily chargeback volume — a relational store with strong transactional guarantees is acceptable; call out where you'd partition or scale out.
Correctness bar: no lost chargeback records and no duplicate exports are the dominant requirements; some end-to-end latency (up to the next batch window) is acceptable.
Compliance: the records are financial data tied to payment disputes (transaction and merchant identifiers, amounts) and may reference card / PII fields depending on the source schema, so the system sits in a regulated, audit-heavy environment. Part of the design is deciding what cardholder data, if any, the records actually carry and keeping anything sensitive out of logs and exports — treat any PAN-class field as PCI-scoped if present.

Clarifying Questions to Ask

What exact CSV schema, encoding, and header/trailer/control-record format does Mastercard require, and is there a per-file or per-record count limit?
Does Mastercard send back acknowledgment / reject files , and through what channel — so we can reconcile accepted vs. rejected records?
Is plain FTP truly mandated, or are SFTP/FTPS available? Are there fixed delivery-window SLAs we'd be penalized for missing?
What is the source of truth for "this chargeback already exists" — can the same dispute legitimately produce multiple records?
What are the data-retention and audit requirements for raw records and generated files?
What daily/peak record volume should the design target, and is there a maximum acceptable end-to-end delay from ingestion to export?

What a Strong Answer Covers

Two-phase decomposition: ingestion path vs. scheduled export path, connected by durable storage — and why that separation matters.
Data model: record entity with an idempotency key and a clear status lifecycle, plus a batch entity and a record-to-batch mapping for reconciliation.
Idempotency at both ends: unique source key + DB constraint on ingest; transactional, exclusive batch assignment on export.
Batching with a clear cutoff and a race-free "claim records → assign to batch → flip status" transaction.
CSV generation: versioned schema, deterministic ordering, checksum, immutable file stored in object storage before transfer.
Transfer reliability: retry with backoff, no file regeneration on retry, atomic remote rename, missed-SLA alerting, and the SFTP/FTPS-over-FTP security stance.
Failure analysis of the partial-failure windows (CSV ok / FTP fails; upload ok / status write fails) and how each is made safe.
Reconciliation against partner ack files and a status update loop for rejects.
Observability: the specific metrics and alerts (backlog, missed window, upload failure, validation-spike, reconciliation mismatch).
Security/compliance: encryption in transit and at rest, secrets management, least privilege, masking/no-logging of cardholder data, retention.

Follow-up Questions

A scheduled batch's FTP upload fails all retries and misses its window. Exactly what state is the batch in, what gets alerted, and how does the next run avoid double-sending or skipping those records?
The upload to Mastercard succeeds but your process crashes before recording success. On the next run, how do you determine the file already arrived and avoid the partner processing it twice?
Mastercard's ack file reports that 3% of a batch was rejected for a bad reason code. Walk through reconciliation: how do those records get re-validated, re-queued, and exported without re-sending the accepted 97%?
Daily volume grows 10x. Which components become bottlenecks first, and how do you scale ingestion, validation, CSV generation, and transfer independently?

Design a chargeback ingestion and export system for a bank.

Constraints & Assumptions

Network: Mastercard accepts files over FTP; in practice you should push for SFTP/FTPS . Assume the partner dictates the file schema, header/trailer format, and naming convention.
Cadence: four scheduled exports per day (e.g. windows at 00:00 / 06:00 / 12:00 / 18:00), each with a hard cutoff so a record either makes the current batch or the next one.
Volume: assume moderate-to-high but bounded daily chargeback volume — a relational store with strong transactional guarantees is acceptable; call out where you'd partition or scale out.
Correctness bar: no lost chargeback records and no duplicate exports are the dominant requirements; some end-to-end latency (up to the next batch window) is acceptable.
Compliance: the records are financial data tied to payment disputes (transaction and merchant identifiers, amounts) and may reference card / PII fields depending on the source schema, so the system sits in a regulated, audit-heavy environment. Part of the design is deciding what cardholder data, if any, the records actually carry and keeping anything sensitive out of logs and exports — treat any PAN-class field as PCI-scoped if present.

Clarifying Questions to Ask

What exact CSV schema, encoding, and header/trailer/control-record format does Mastercard require, and is there a per-file or per-record count limit?
Does Mastercard send back acknowledgment / reject files , and through what channel — so we can reconcile accepted vs. rejected records?
Is plain FTP truly mandated, or are SFTP/FTPS available? Are there fixed delivery-window SLAs we'd be penalized for missing?
What is the source of truth for "this chargeback already exists" — can the same dispute legitimately produce multiple records?
What are the data-retention and audit requirements for raw records and generated files?
What daily/peak record volume should the design target, and is there a maximum acceptable end-to-end delay from ingestion to export?

What a Strong Answer Covers

Two-phase decomposition: ingestion path vs. scheduled export path, connected by durable storage — and why that separation matters.
Data model: record entity with an idempotency key and a clear status lifecycle, plus a batch entity and a record-to-batch mapping for reconciliation.
Idempotency at both ends: unique source key + DB constraint on ingest; transactional, exclusive batch assignment on export.
Batching with a clear cutoff and a race-free "claim records → assign to batch → flip status" transaction.
CSV generation: versioned schema, deterministic ordering, checksum, immutable file stored in object storage before transfer.
Transfer reliability: retry with backoff, no file regeneration on retry, atomic remote rename, missed-SLA alerting, and the SFTP/FTPS-over-FTP security stance.
Failure analysis of the partial-failure windows (CSV ok / FTP fails; upload ok / status write fails) and how each is made safe.
Reconciliation against partner ack files and a status update loop for rejects.
Observability: the specific metrics and alerts (backlog, missed window, upload failure, validation-spike, reconciliation mismatch).
Security/compliance: encryption in transit and at rest, secrets management, least privilege, masking/no-logging of cardholder data, retention.

Follow-up Questions

A scheduled batch's FTP upload fails all retries and misses its window. Exactly what state is the batch in, what gets alerted, and how does the next run avoid double-sending or skipping those records?
The upload to Mastercard succeeds but your process crashes before recording success. On the next run, how do you determine the file already arrived and avoid the partner processing it twice?
Mastercard's ack file reports that 3% of a batch was rejected for a bad reason code. Walk through reconciliation: how do those records get re-validated, re-queued, and exported without re-sending the accepted 97%?
Daily volume grows 10x. Which components become bottlenecks first, and how do you scale ingestion, validation, CSV generation, and transfer independently?

Design a chargeback ingestion and export system

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design a chargeback ingestion and export system

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP