PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates pandas-based data manipulation skills, specifically proficiency with groupby/agg, merge/concat, rolling-window and time-series aggregations, deduplication and tie-breaking logic for ranking and joins.

  • Medium
  • Amazon
  • Data Manipulation (SQL/Python)
  • Data Scientist

Transform retail data with pandas groupby/merge/concat

Company: Amazon

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Using pandas only (groupby/agg/merge/concat; no for-loops), write code to answer the sub-questions below on the following small dataframes. Assume timestamps are UTC. users user_id | signup_date | region | device U1 | 2025-08-28 | US | iOS U2 | 2025-08-30 | US | Web U3 | 2025-09-01 | CA | Android U4 | 2025-09-02 | US | Web orders order_id | user_id | ts | amount | category O1 | U1 | 2025-09-01 10:05 | 20.00 | Books O2 | U2 | 2025-09-02 09:10 | 35.00 | Home O3 | U1 | 2025-09-02 11:00 | 15.00 | Books O4 | U3 | 2025-09-03 12:00 | 50.00 | Games O5 | U2 | 2025-09-03 13:30 | 60.00 | Home events user_id | ts | event U1 | 2025-09-01 09:00 | view U1 | 2025-09-01 09:05 | add_to_cart U2 | 2025-09-02 09:00 | view U3 | 2025-09-03 11:50 | view U3 | 2025-09-03 11:55 | view U4 | 2025-09-03 15:00 | view Tasks: 1) Daily Active Users (DAU) by device: compute DAU per calendar day using events, then compute a 2-day rolling unique user count per device (aligned to day end). Explain how you ensure uniqueness across days. 2) First order: left-join users to first order per user to produce first_order_ts and days_to_first_order (float days), with NaN for users without orders. Be careful about users who have same-day signup and order. 3) Top categories per user: compute each user’s top-2 categories by total spend (ties broken alphabetically), returning columns top_cat_1 and top_cat_2; users with <2 categories should have NaN for missing values. 4) Add an "ALL" summary row that aggregates overall revenue by day across all devices and concatenate it to the per-device daily revenue table (schema: day, device, revenue). Ensure consistent column types after concat. For each step, provide pandas code, resulting schema, and explain time/memory complexity and edge cases (empty joins, duplicate events, differing time zones).

Quick Answer: This question evaluates pandas-based data manipulation skills, specifically proficiency with groupby/agg, merge/concat, rolling-window and time-series aggregations, deduplication and tie-breaking logic for ranking and joins.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find recommended friend pairs by shared songs - Amazon (medium)
  • Find recommended friend pairs by shared listening - Amazon (easy)
  • Write SQL window functions for D7 retention - Amazon (medium)
  • Find daily first-order merchants with SQL - Amazon (Medium)
  • Design student–course data models and SQL - Amazon (Medium)