PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates proficiency in SQL data manipulation and analytical querying—specifically deduplication using idempotency keys, window functions/CTEs for ranking, distinct aggregation by date, timestamp tie-breaking, and awareness of indexing and scalability trade-offs in large tables; the domain is Data Manipulation (SQL/Python).

  • Medium
  • Google
  • Data Manipulation (SQL/Python)
  • Data Scientist

Deduplicate events and rank products with SQL

Company: Google

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You are given two tables. Schema: - events(event_id INT PRIMARY KEY, user_id INT, product_id INT, event_time TIMESTAMP, idempotency_key TEXT, amount_cents INT) - products(product_id INT PRIMARY KEY, product_name TEXT) Sample data (minimal but representative): events +----------+---------+------------+---------------------+-----------------+--------------+ | event_id | user_id | product_id | event_time | idempotency_key | amount_cents | +----------+---------+------------+---------------------+-----------------+--------------+ | 101 | 1 | 10 | 2025-08-31 23:59:58 | abc | 1299 | | 102 | 1 | 10 | 2025-08-31 23:59:59 | abc | 1299 | | 103 | 2 | 10 | 2025-09-01 00:00:03 | def | 1299 | | 104 | 2 | 10 | 2025-09-01 00:05:01 | def | 1299 | | 105 | 2 | 20 | 2025-09-01 00:06:00 | ghi | 2599 | | 106 | 3 | 20 | 2025-09-01 12:00:00 | jkl | 2599 | | 107 | 1 | 30 | 2025-09-01 12:05:00 | mno | 3099 | +----------+---------+------------+---------------------+-----------------+--------------+ products +------------+--------------+ | product_id | product_name | +------------+--------------+ | 10 | Basic Tee | | 20 | Hoodie | | 30 | Socks | +------------+--------------+ Task A — De-duplicate retry events: Some payments are retried and share the same (user_id, idempotency_key). Write ANSI SQL that keeps exactly one row per (user_id, idempotency_key), choosing the row with the earliest event_time; break ties by the smallest event_id. Return all columns of the kept rows. Task B — Rank products by distinct purchasers for a given date: Using the de-duplicated rows from Task A, write a single SQL query that returns, for the calendar date 2025-09-01 (UTC), the top 2 products by distinct purchasing users. - Output columns: product_id, product_name, distinct_buyers, rank. - Use window functions to compute ranks; break ties by product_id ASC. - Do not use temporary tables; a CTE-based solution is acceptable. Explain how your solution scales if events has 10^9 rows and how you would index to support it.

Quick Answer: This question evaluates proficiency in SQL data manipulation and analytical querying—specifically deduplication using idempotency keys, window functions/CTEs for ranking, distinct aggregation by date, timestamp tie-breaking, and awareness of indexing and scalability trade-offs in large tables; the domain is Data Manipulation (SQL/Python).

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Generate binomial matrix and column-normalize - Google (Medium)
  • Analyze video flags and reviews with SQL - Google (Medium)
  • Write SQL/Python for messy event data - Google (Medium)
  • Add a conditional column in Python - Google (Medium)
  • Find most co‑purchased product pairs in SQL - Google (Medium)