PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Data Manipulation (SQL/Python)/Boston Consulting Group

Transform and aggregate messy event data

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in data wrangling with pandas, including datetime parsing and timezone considerations, currency/string cleaning and coercion to numeric types with handling of missing or malformed amounts, DataFrame column alignment and concatenation, joins, and group-by aggregations to produce per-country/plan day-level metrics. It is commonly asked to assess practical data-manipulation skills for producing analysis-ready metrics from messy event and user datasets, falls under the Data Manipulation (SQL/Python) domain, and targets practical application rather than purely conceptual understanding.

  • Medium
  • Boston Consulting Group
  • Data Manipulation (SQL/Python)
  • Data Scientist

Transform and aggregate messy event data

Company: Boston Consulting Group

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Take-home Project

Using pandas (vectorized; no loops), clean, combine, and aggregate the following to produce country/plan day-level metrics for 2025-08-31. DataFrames (ASCII): users user_id | signup_date | plan | monthly_fee | country 1 | 2025-08-30 | Pro | "$12.50" | US 2 | 2025-08-31 | Free | "$0" | US 3 | 2025-07-15 | Pro | " $15 " | CA 4 | 2025-08-31 | Pro | "$12.50" | US events_august user_id | event_time | event_type | amount 1 | 2025-08-31 09:12:00 | login | 1 | 2025-08-31 09:15:00 | purchase | "12.50" 2 | 2025-08-31 10:01:00 | login | 3 | 2025-08-30 23:58:00 | login | 4 | 2025-08-31 09:59:00 | purchase | "$12.50" events_august_extra (same fields, shuffled order): event_type | amount | user_id | event_time login | | 1 | 2025-08-31 10:10:00 purchase | "12.50" | 2 | 2025-08-31 10:30:00 Tasks: 1) Convert users.signup_date and all event_time to pandas datetime; strip currency/whitespace and coerce monthly_fee and amount to float (NaN on errors). 2) Column-align and concatenate events_august and events_august_extra into events_all. 3) Join users to events_all on user_id. 4) For date 2025-08-31 only, compute per (country, plan): active_users = count of distinct users with ≥1 login; purchasers = count of distinct users with ≥1 purchase; purchases_count = number of purchase events; revenue = sum(amount over purchase events). 5) Return a tidy DataFrame with columns [date, country, plan, active_users, purchasers, purchases_count, revenue] sorted by country asc, plan asc. State any assumptions about missing amounts and timezone handling.

Quick Answer: This question evaluates a candidate's competency in data wrangling with pandas, including datetime parsing and timezone considerations, currency/string cleaning and coercion to numeric types with handling of missing or malformed amounts, DataFrame column alignment and concatenation, joins, and group-by aggregations to produce per-country/plan day-level metrics. It is commonly asked to assess practical data-manipulation skills for producing analysis-ready metrics from messy event and user datasets, falls under the Data Manipulation (SQL/Python) domain, and targets practical application rather than purely conceptual understanding.

Related Interview Questions

  • Transform messy transactions with pandas - Boston Consulting Group (Medium)
  • Unify 7 tables and impute missing values - Boston Consulting Group (Medium)
  • Query top spenders and 7-day growth - Boston Consulting Group (Medium)
  • Manipulate and merge DataFrames correctly - Boston Consulting Group (Medium)
  • Merge and Concatenate Inconsistent Order Files with Pandas - Boston Consulting Group (Medium)
Boston Consulting Group logo
Boston Consulting Group
Oct 13, 2025, 9:49 PM
Data Scientist
Take-home Project
Data Manipulation (SQL/Python)
1
0

Using pandas (vectorized; no loops), clean, combine, and aggregate the following to produce country/plan day-level metrics for 2025-08-31. DataFrames (ASCII): users user_id | signup_date | plan | monthly_fee | country 1 | 2025-08-30 | Pro | "12.50"∣US2∣2025−08−31∣Free∣"12.50" | US 2 | 2025-08-31 | Free | "12.50"∣US2∣2025−08−31∣Free∣"0" | US 3 | 2025-07-15 | Pro | " 15"∣CA4∣2025−08−31∣Pro∣"15 " | CA 4 | 2025-08-31 | Pro | "15"∣CA4∣2025−08−31∣Pro∣"12.50" | US events_august user_id | event_time | event_type | amount 1 | 2025-08-31 09:12:00 | login | 1 | 2025-08-31 09:15:00 | purchase | "12.50" 2 | 2025-08-31 10:01:00 | login | 3 | 2025-08-30 23:58:00 | login | 4 | 2025-08-31 09:59:00 | purchase | "$12.50" events_august_extra (same fields, shuffled order): event_type | amount | user_id | event_time login | | 1 | 2025-08-31 10:10:00 purchase | "12.50" | 2 | 2025-08-31 10:30:00 Tasks: 1) Convert users.signup_date and all event_time to pandas datetime; strip currency/whitespace and coerce monthly_fee and amount to float (NaN on errors). 2) Column-align and concatenate events_august and events_august_extra into events_all. 3) Join users to events_all on user_id. 4) For date 2025-08-31 only, compute per (country, plan): active_users = count of distinct users with ≥1 login; purchasers = count of distinct users with ≥1 purchase; purchases_count = number of purchase events; revenue = sum(amount over purchase events). 5) Return a tidy DataFrame with columns [date, country, plan, active_users, purchasers, purchases_count, revenue] sorted by country asc, plan asc. State any assumptions about missing amounts and timezone handling.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Boston Consulting Group•More Data Scientist•Boston Consulting Group Data Scientist•Boston Consulting Group Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.