PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates proficiency in pandas-based data manipulation, including flattening nested list and dict structures, parsing nested attributes, computing per-user aggregates and segment metrics, and implementing robust, vectorized extraction logic that handles missing keys and avoids SettingWithCopy pitfalls.

  • Medium
  • Pinterest
  • Data Manipulation (SQL/Python)
  • Data Scientist

Transform nested dicts with pandas apply/lambda

Company: Pinterest

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Onsite

Given a pandas DataFrame df with columns: user_id (int), ts (datetime64[ns]), events (list of dicts), attrs (dict). Example rows (conceptual): user_id=1, ts=2025-08-08 09:12:00, events=[{"type":"click","ts":"2025-08-08T09:12:00"},{"type":"view","ts":"2025-08-08T09:12:05"}], attrs={"version":"1.2.0","flags":{"beta":true,"dark":false}} user_id=1, ts=2025-08-09 15:00:00, events=[{"type":"purchase","ts":"2025-08-09T15:00:00","amount":20.0}], attrs={"version":"1.2.0","flags":{"beta":true,"dark":false}} user_id=2, ts=2025-08-04 12:30:00, events=[{"type":"view","ts":"2025-08-04T12:30:00"}], attrs={"version":"1.1.0","flags":{"beta":false,"dark":true}} Tasks (write Python/pandas): 1) Explode events into a long table with one row per event: columns [user_id, event_type, event_ts (datetime), amount (nullable), attrs_version, attrs_beta_flag, attrs_dark_flag]. Use Series.explode and apply/lambda to parse dicts; no for-loops over DataFrame rows. 2) Aggregate to a per-user features table with columns: click_count, view_count, purchase_count, last_event_ts, total_purchase_amount, version_most_recent, beta_flag_most_recent, dark_flag_most_recent. Use groupby with named aggregations; where multiple versions exist, take the attrs associated with the most recent event. 3) From the features table, compute a conversion_rate by user segment defined by (version_most_recent, beta_flag_most_recent, dark_flag_most_recent): purchases/users. Return a compact DataFrame with one row per segment and columns [version, beta_flag, dark_flag, users, purchasers, conversion_rate]. 4) Implement a robust helper that safely extracts nested keys from attrs using a lambda and dictionary iteration (handle missing keys and None). Explain in comments why your approach avoids SettingWithCopy pitfalls and preserves vectorization.

Quick Answer: This question evaluates proficiency in pandas-based data manipulation, including flattening nested list and dict structures, parsing nested attributes, computing per-user aggregates and segment metrics, and implementing robust, vectorized extraction logic that handles missing keys and avoids SettingWithCopy pitfalls.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Write Queries for Pinterest Engagement Tasks - Pinterest (medium)
  • Write SQL for top categories and highly active users - Pinterest (easy)
  • Compute percent of first-cancelled users who never rebook - Pinterest (medium)
  • Compute percent of first-cancel users who never return - Pinterest (easy)
  • Find top category by video time spent - Pinterest (Medium)