This question evaluates a candidate's competency in duplicate detection, record linkage, entity resolution, and data quality assessment within credit card transaction datasets.
You are given a dataset of credit card transaction records and suspect that some records are duplicates.
Discuss:
Your answer should consider both exact duplicates and near-duplicates caused by ingestion issues, retries, formatting differences, or multiple processing stages.