Merge CSVs and build revenue pivot with pandas

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Take-home Project rounds at Capital One.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Capital One during technical interviews.

Question

You receive four CSVs and must replicate an Excel VLOOKUP + PivotTable workflow using Python/pandas.

CSV samples:
customers.csv
customer_id,signup_date,channel
1,2025-06-02,Organic
2,2025-06-10,Ads
3,2025-07-05,Referral

orders.csv
order_id,customer_id,order_date,total_amount
101,1,2025-06-12,120
102,2,2025-06-28,80
103,1,2025-07-15,200
104,3,2025-08-02,50

refunds.csv
order_id,refund_amount
102,20
104,50

targets.csv
month,channel,revenue_target
2025-06,Organic,100
2025-06,Ads,120
2025-07,Organic,150
2025-08,Referral,60

Task: Write pandas code to (a) robustly load these CSVs (assume one file path may initially fail—gracefully retry or fall back without crashing), (b) compute net_revenue per order = total_amount - COALESCE(refund_amount,0), (c) roll up to monthly net revenue by channel for months 2025-06 through 2025-08 based on order_date, (d) left-join the result to targets.csv on (month, channel) to compute target_gap = net_revenue - revenue_target (treat missing targets as 0), and (e) produce a pivot table with index=month (YYYY-MM), columns=channel, values=net_revenue, plus an additional similarly-shaped table for target_gap. Ensure date parsing is correct, missing refunds are handled, and channels with no orders in a month still appear with 0 in the pivot.

PracHub · Accepted Answer

This question evaluates proficiency in data manipulation and ETL using pandas, covering robust file I/O, join/merge semantics, null/coalescing behavior, date parsing, aggregation, and pivoting.

Quick Overview

Quick Overview