Use pandas to aggregate, pivot, and label

Q: Use pandas to aggregate, pivot, and label

This question evaluates competency in pandas-based tabular data manipulation—specifically DataFrame merging, groupby aggregation, pivot table construction, counting unique values, and conditional flagging with numpy—within the domain of Data Manipulation (SQL/Python) and is primarily at the practical application level.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Given two pandas DataFrames, write code to: (1) merge and aggregate revenue; (2) produce a 2x2 pivot; (3) compute per-state counts with value_counts, nunique/size; (4) add a binary flag via np.where. Reuse the merged DataFrame across parts (assume it persists between steps).

Data (toy, representative) users user_id | is_member | state | age 101 | 1 | CA | 29 102 | 0 | NY | 41 103 | 1 | CA | 35 104 | 0 | TX | 50

orders order_id | user_id | channel | amount | status 7001 | 101 | SMS | 12.00 | delivered 7002 | 102 | Email | 5.00 | delivered 7003 | 103 | SMS | 7.00 | delivered 7004 | 103 | Email | 4.00 | delivered 7005 | 101 | Organic | 3.50 | delivered 7006 | 104 | SMS | 6.00 | undelivered

Tasks

Step 1: Merge orders with users on user_id (left join). Compute two outputs: (a) total delivered revenue by channel; (b) delivered revenue by channel restricted to members (is_member==1). Show groupby(...).sum() results as DataFrames.
Step 2: Create a 2x2 pivot of delivered revenue with index=is_member (0/1) and columns=channel in ['SMS','Email'] only, values=amount, aggfunc='sum', fill missing cells with 0. Use pivot_table with aggfunc='sum'.
Step 3: From the merged DataFrame, compute per-state: total orders (size) and unique purchasers (nunique of user_id). Return the top-2 states by total orders using sort_values.
Step 4: Add column high_value_flag = 1 if (user's lifetime delivered amount >= 15) OR (number of delivered SMS orders per user >= 2), else 0. Use np.where and prior groupby aggregations to avoid SettingWithCopy warnings. Show the final head with relevant columns.

Use pandas to aggregate, pivot, and label

Overview

Comments (0)