PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Data Manipulation (SQL/Python)/CVS Health

Use pandas to aggregate, pivot, and label

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in pandas-based tabular data manipulation—specifically DataFrame merging, groupby aggregation, pivot table construction, counting unique values, and conditional flagging with numpy—within the domain of Data Manipulation (SQL/Python) and is primarily at the practical application level.

  • Medium
  • CVS Health
  • Data Manipulation (SQL/Python)
  • Data Scientist

Use pandas to aggregate, pivot, and label

Company: CVS Health

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Given two pandas DataFrames, write code to: (1) merge and aggregate revenue; (2) produce a 2x2 pivot; (3) compute per-state counts with value_counts, nunique/size; (4) add a binary flag via np.where. Reuse the merged DataFrame across parts (assume it persists between steps). Data (toy, representative) users user_id | is_member | state | age 101 | 1 | CA | 29 102 | 0 | NY | 41 103 | 1 | CA | 35 104 | 0 | TX | 50 orders order_id | user_id | channel | amount | status 7001 | 101 | SMS | 12.00 | delivered 7002 | 102 | Email | 5.00 | delivered 7003 | 103 | SMS | 7.00 | delivered 7004 | 103 | Email | 4.00 | delivered 7005 | 101 | Organic | 3.50 | delivered 7006 | 104 | SMS | 6.00 | undelivered Tasks - Step 1: Merge orders with users on user_id (left join). Compute two outputs: (a) total delivered revenue by channel; (b) delivered revenue by channel restricted to members (is_member==1). Show groupby(...).sum() results as DataFrames. - Step 2: Create a 2x2 pivot of delivered revenue with index=is_member (0/1) and columns=channel in ['SMS','Email'] only, values=amount, aggfunc='sum', fill missing cells with 0. Use pivot_table with aggfunc='sum'. - Step 3: From the merged DataFrame, compute per-state: total orders (size) and unique purchasers (nunique of user_id). Return the top-2 states by total orders using sort_values. - Step 4: Add column high_value_flag = 1 if (user's lifetime delivered amount >= 15) OR (number of delivered SMS orders per user >= 2), else 0. Use np.where and prior groupby aggregations to avoid SettingWithCopy warnings. Show the final head with relevant columns.

Quick Answer: This question evaluates competency in pandas-based tabular data manipulation—specifically DataFrame merging, groupby aggregation, pivot table construction, counting unique values, and conditional flagging with numpy—within the domain of Data Manipulation (SQL/Python) and is primarily at the practical application level.

Related Interview Questions

  • Create and query an e-commerce schema - CVS Health (Medium)
  • Aggregate radiology spend and derive fiscal month - CVS Health (Medium)
  • Compute age-band spend and YoY in Georgia - CVS Health (Medium)
  • Calculate annual percentages and YoY by cohorts - CVS Health (Medium)
  • Write SQL for dedup and purchase shares - CVS Health (Medium)
CVS Health logo
CVS Health
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
5
0

Given two pandas DataFrames, write code to: (1) merge and aggregate revenue; (2) produce a 2x2 pivot; (3) compute per-state counts with value_counts, nunique/size; (4) add a binary flag via np.where. Reuse the merged DataFrame across parts (assume it persists between steps).

Data (toy, representative) users user_id | is_member | state | age 101 | 1 | CA | 29 102 | 0 | NY | 41 103 | 1 | CA | 35 104 | 0 | TX | 50

orders order_id | user_id | channel | amount | status 7001 | 101 | SMS | 12.00 | delivered 7002 | 102 | Email | 5.00 | delivered 7003 | 103 | SMS | 7.00 | delivered 7004 | 103 | Email | 4.00 | delivered 7005 | 101 | Organic | 3.50 | delivered 7006 | 104 | SMS | 6.00 | undelivered

Tasks

  • Step 1: Merge orders with users on user_id (left join). Compute two outputs: (a) total delivered revenue by channel; (b) delivered revenue by channel restricted to members (is_member==1). Show groupby(...).sum() results as DataFrames.
  • Step 2: Create a 2x2 pivot of delivered revenue with index=is_member (0/1) and columns=channel in ['SMS','Email'] only, values=amount, aggfunc='sum', fill missing cells with 0. Use pivot_table with aggfunc='sum'.
  • Step 3: From the merged DataFrame, compute per-state: total orders (size) and unique purchasers (nunique of user_id). Return the top-2 states by total orders using sort_values.
  • Step 4: Add column high_value_flag = 1 if (user's lifetime delivered amount >= 15) OR (number of delivered SMS orders per user >= 2), else 0. Use np.where and prior groupby aggregations to avoid SettingWithCopy warnings. Show the final head with relevant columns.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More CVS Health•More Data Scientist•CVS Health Data Scientist•CVS Health Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.