PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates SQL and pandas proficiency in data manipulation tasks including missing-value imputation, tiered aggregation, joins and upsert/deduplication, time-windowed revenue computation, cohort retention analysis, and handling of monetary versus non-monetary events within a relational schema.

  • Medium
  • Capital One
  • Data Manipulation (SQL/Python)
  • Data Scientist

Impute, join, and upsert using SQL and Python

Company: Capital One

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Write both SQL and Python (pandas) to complete the following data-manipulation tasks. Assume today is 2025-09-01 for any time filters. Schema: customers(customer_id INT, signup_date DATE, age INT, tier TEXT) events(event_id INT, customer_id INT, event_date DATE, event_type TEXT, amount DECIMAL(10,2)) staging_events(event_id INT, customer_id INT, event_date DATE, event_type TEXT, amount DECIMAL(10,2)) payments(payment_id INT, customer_id INT, payment_date DATE, amount DECIMAL(10,2)) Sample data (minimal): customers +-------------+-------------+-----+--------+ | customer_id | signup_date | age | tier | +-------------+-------------+-----+--------+ | 1 | 2025-08-20 | 34 | gold | | 2 | 2025-08-28 | NULL| silver | | 3 | 2025-08-29 | 27 | silver | | 4 | 2025-08-30 | NULL| bronze | +-------------+-------------+-----+--------+ events +----------+-------------+-------------+------------+--------+ | event_id | customer_id | event_date | event_type | amount | +----------+-------------+-------------+------------+--------+ | 10 | 1 | 2025-08-28 | purchase | 120.00 | | 11 | 2 | 2025-08-30 | purchase | 80.00 | | 12 | 2 | 2025-09-01 | refund | -20.00 | | 13 | 3 | 2025-08-26 | purchase | 60.00 | | 14 | 4 | 2025-08-27 | page_view | NULL | +----------+-------------+-------------+------------+--------+ staging_events +----------+-------------+-------------+------------+--------+ | event_id | customer_id | event_date | event_type | amount | +----------+-------------+-------------+------------+--------+ | 12 | 2 | 2025-09-01 | refund | -20.00 | | 15 | 1 | 2025-08-31 | purchase | 120.00 | +----------+-------------+-------------+------------+--------+ payments +------------+-------------+--------------+--------+ | payment_id | customer_id | payment_date | amount | +------------+-------------+--------------+--------+ | 100 | 1 | 2025-08-31 | 120.00 | | 101 | 3 | 2025-08-31 | 60.00 | +------------+-------------+--------------+--------+ Tasks: A) Impute missing ages in customers using the median age within tier, falling back to the global median if a tier’s median is null; return customer_id and imputed_age. B) Upsert from staging_events into events: insert rows whose event_id does not exist in events; if an event_id exists in both with different values, keep a single row with the latest event_date and its values; return the deduplicated events table. C) For the last 7 days inclusive (2025-08-26 to 2025-09-01), compute per-tier net revenue where purchase amounts are positive and refund amounts are negative; exclude non-monetary events (like page_view); use imputed_age and restrict to customers aged 18–65; return tier, total_revenue_7d, and customer_count_7d. D) Compute 7-day retention: among customers with a monetary event in the last 7 days, what fraction also had any event in the prior 7-day window (2025-08-19 to 2025-08-25)? Return one row per tier with retention_rate. Provide both SQL and pandas solutions; state any indexing choices and how you would test correctness.

Quick Answer: This question evaluates SQL and pandas proficiency in data manipulation tasks including missing-value imputation, tiered aggregation, joins and upsert/deduplication, time-windowed revenue computation, cohort retention analysis, and handling of monetary versus non-monetary events within a relational schema.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Clean and Merge Housing Data - Capital One (easy)
  • Find Lowest Prices for Highly Rated Categories - Capital One (medium)
  • Write SQL to compute campaign net revenue - Capital One (Medium)
  • Merge CSVs and build revenue pivot with pandas - Capital One (Medium)
  • Find top category per region in Aug 2025 - Capital One (Medium)