Manipulate and merge DataFrames correctly

Q: Manipulate and merge DataFrames correctly

This is a Data Manipulation (SQL/Python) interview question from Boston Consulting Group for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Given three pandas DataFrames: customers customer_id, join_date, tier 101, 2025-01-02, gold 102, 2025-02-10, silver 103, 2025-03-05, gold

products model_id, model_name, msrp 1, Sedan, 20000 2, SUV, 30000

orders order_id, order_date, customer_id, model_id, qty, unit_price, status 1, 2025-08-30, 102, 2, 1, 30000, completed 2, 2025-09-01, 103, 2, 2, 29000, completed 3, 2025-08-26, 101, 1, 1, 19500, returned

Tasks: (a) Filter to completed orders, then add revenue = qty*unit_price and drop status; (b) merge orders with products and customers using keys (order->products one-to-one, order->customers many-to-one) and enforce merge validation to catch duplicates; (c) compute discount_pct = clip(1 - unit_price/msrp, lower=0, upper=0.5) and impute missing msrp with the median per model_name; (d) add first_purchase_date per customer via groupby/transform, then keep only each customer’s first purchase per model; (e) ensure no SettingWithCopy warnings and set dtypes (categorical for tier and model_name; datetime for dates); (f) return columns [customer_id, model_name, revenue, discount_pct, first_purchase_date] sorted by revenue desc. Provide idiomatic, vectorized pandas code that is idempotent.

Manipulate and merge DataFrames correctly

Comments (0)