PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates SQL and pandas proficiency for event-level data manipulation, including time zone–aware local date conversion, session-based deduplication, cohort and conversion metric calculation, and basic experiment sanity checking via a sample-ratio-mismatch p-value.

  • Medium
  • Google
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write SQL/Python for messy event data

Company: Google

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Using the schema and sample data below, write: (1) a single SQL query to compute daily metrics for the local date 2025-09-01 in America/Los_Angeles, and (2) a Python (pandas) transformation. Goal A (SQL): For 2025-09-01 (America/Los_Angeles local date derived from UTC timestamps), output one row with: - new_buyers: users whose first paid order occurred on that local date, excluding paid orders refunded within 24 hours of order_time_utc and all canceled orders. - cart_to_paid_new: among users counted as new_buyers, the share who had at least one add_to_cart event in the 7 local days prior to their first paid order; deduplicate add_to_cart events within the same session_id by collapsing events that are ≤10 minutes apart into one. - cart_to_paid_returning: same conversion on that local date for users who placed a paid order that day but whose first paid order was before 2025-09-01. - srm_p_value: a sample-ratio-mismatch p-value for the treatment vs control split among users observed on 2025-09-01 (based on experiments.variant), assuming 50/50 expected split. Assumptions: timestamps are stored in UTC; convert to America/Los_Angeles for local dates; if multiple paid orders exist on the first-paid day, treat the earliest qualifying paid order as first. Goal B (Python/pandas): Given a DataFrame events with the columns below (including a dict-like props_json), produce a DataFrame with columns [user_id, local_dt, sku, dedup_add_to_cart_cnt] where local_dt is the America/Los_Angeles date of event_time_utc; deduplicate add_to_cart within each [user_id, session_id, sku] by treating any subsequent add_to_cart within 2 minutes as the same action; count distinct deduped add_to_cart per [user_id, local_dt, sku]. Show idiomatic pandas code (no UDFs) and explain time zone handling. Schema and small ASCII samples: users --------- user_id | signup_utc | referrer 1 | 2025-08-20 13:02:10 | ads 2 | 2025-08-28 21:50:05 | seo 3 | 2025-08-31 02:11:34 | direct 4 | 2025-09-01 04:00:12 | ads 5 | 2025-09-01 05:47:40 | partner events --------------------------------------------------------------------------------------------- user_id | event_time_utc | event_type | session_id | device | props_json 1 | 2025-08-30 16:00:00 | add_to_cart | s1 | ios | {"sku":"A1","qty":1} 1 | 2025-08-30 16:01:00 | add_to_cart | s1 | ios | {"sku":"A1","qty":1} 1 | 2025-08-30 16:05:00 | purchase_click | s1 | ios | {} 2 | 2025-09-01 01:02:03 | add_to_cart | s2 | web | {"sku":"B2","qty":2} 3 | 2025-08-25 09:10:00 | add_to_cart | s3 | android | {"sku":"C3","qty":1} orders -------------------------------------------------------------------------------------- order_id | user_id | order_time_utc | amount_usd | status | refund_time_utc 10 | 1 | 2025-09-01 16:15:00 | 120.00 | paid | 2025-09-01 18:00:00 11 | 1 | 2025-09-01 16:20:00 | 35.00 | canceled | null 12 | 2 | 2025-09-01 02:15:00 | 50.00 | paid | null 13 | 3 | 2025-08-26 11:00:00 | 20.00 | paid | 2025-08-27 08:00:00 14 | 4 | 2025-09-02 07:00:00 | 15.00 | paid | null experiments ----------------------------- user_id | dt | variant 1 | 2025-09-01 | treatment 2 | 2025-09-01 | control 3 | 2025-09-01 | treatment 4 | 2025-09-02 | control 5 | 2025-09-01 | treatment

Quick Answer: This question evaluates SQL and pandas proficiency for event-level data manipulation, including time zone–aware local date conversion, session-based deduplication, cohort and conversion metric calculation, and basic experiment sanity checking via a sample-ratio-mismatch p-value.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Generate binomial matrix and column-normalize - Google (Medium)
  • Analyze video flags and reviews with SQL - Google (Medium)
  • Add a conditional column in Python - Google (Medium)
  • Find most co‑purchased product pairs in SQL - Google (Medium)
  • Design a scalable video platform database - Google (Medium)