You are given two tables.
Table: transactions
| Column | Type | Description |
|---|
| transaction_id | int | Unique identifier for each transaction. |
| vendor_id | int | Identifier of the vendor. |
| amount_charged | int | Amount charged in the transaction. |
| transaction_type | varchar | Either PURCHASE or REFUND. |
| refund_transaction_id | int | For a refund, the original purchase transaction id; otherwise null. |
| timestamp | datetime | Time when the transaction happened. |
Sample rows:
| transaction_id | vendor_id | amount_charged | transaction_type | refund_transaction_id | timestamp |
|---|
| 101 | 1 | 200 | PURCHASE | null | 2025-01-01 |
| 102 | 1 | 150 | PURCHASE | null | 2025-01-10 |
| 103 | 1 | 200 | REFUND | 101 | 2025-01-15 |
| 104 | 2 | 300 | PURCHASE | null | 2025-02-01 |
| 105 | 3 | 500 | PURCHASE | null | 2025-03-01 |
Table: vendors
| Column | Type | Description |
|---|
| vendor_id | int | Unique vendor identifier. |
| city | varchar | Vendor city. |
| state | varchar | Vendor state. |
| country | varchar | Vendor country. |
Sample rows:
| vendor_id | city | state | country |
|---|
| 1 | San Jose | CA | US |
| 2 | San Jose | CA | US |
| 3 | Austin | TX | US |
| 4 | Seattle | WA | US |
| 5 | Dallas | TX | US |
Assume the past two years are measured relative to the current date. Treat purchases as positive revenue and refunds as negative revenue unless the question explicitly asks for gross charged amount.
Answer the following SQL questions:
-
Find the top 3 vendors with the highest total net amount charged over the past 2 years.
-
For each state and city, find the top 3 vendors with the highest total net amount charged over the past 2 years.
-
Propose several data quality or anomaly checks for this transaction data, and write SQL queries to detect those issues. Examples may include invalid refund references, refund amounts that do not match the original purchase, duplicate transaction ids, vendors missing from the vendor table, and unusual spikes in transaction volume or amount.