PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates SQL and Python data-manipulation competencies, covering date-based user filtering, joins and deduplication across events, users and pin metadata, aggregated CTR calculation with division-by-zero handling, and vectorized Pandas grouping, NaN handling, case-insensitive filtering, and tie-breaking.

  • Medium
  • Upstart
  • Data Manipulation (SQL/Python)
  • Data Scientist

Solve SQL CTR and Python analytics tasks

Company: Upstart

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: HR Screen

Part A — SQL: Compute click-through rate (CTR) by pin_format for US new users. New users are those whose signup_date is within 30 days (inclusive) of the action event_date. Schema (invented): - events(event_date DATE, user_id INT, pin_id INT, event_type STRING CHECK IN ('impression','click'), event_count INT) - users(user_id INT, country_code STRING, signup_date DATE) - pin_classification(pin_id INT, pin_format STRING CHECK IN ('video','static','story','other')) Notes/requirements: - Join events→users on user_id and events→pin_classification on pin_id. - Filter users.country_code = 'US'. - Define new user as 0 ≤ DATEDIFF(event_date, signup_date) ≤ 30 (inclusive). - Aggregate impressions = SUM(event_count WHERE event_type='impression'); clicks = SUM(event_count WHERE event_type='click'). - CTR = clicks / impressions, computed after aggregation at the pin_format level; exclude pin_formats whose aggregated impressions = 0 to avoid division by zero. - Return one row per pin_format with columns: pin_format, impressions, clicks, ctr (rounded to 4 decimals), and the distinct_new_users contributing to those events. - Ensure no double counting if multiple rows per (event_date, user_id, pin_id, event_type). ASCII sample tables: users +---------+--------------+-------------+ | user_id | country_code | signup_date | +---------+--------------+-------------+ | 101 | US | 2025-07-20 | | 102 | US | 2025-08-05 | | 103 | CA | 2025-08-01 | | 104 | US | 2025-06-10 | +---------+--------------+-------------+ pin_classification +--------+------------+ | pin_id | pin_format | +--------+------------+ | 10 | video | | 11 | static | | 12 | video | | 13 | story | +--------+------------+ events +------------+---------+--------+------------+-------------+ | event_date | user_id | pin_id | event_type | event_count | +------------+---------+--------+------------+-------------+ | 2025-08-01 | 101 | 10 | impression | 5 | | 2025-08-01 | 101 | 10 | click | 1 | | 2025-08-10 | 102 | 11 | impression | 3 | | 2025-08-10 | 102 | 11 | click | 0 | | 2025-08-15 | 103 | 12 | impression | 4 | | 2025-08-15 | 103 | 12 | click | 2 | | 2025-08-20 | 104 | 13 | impression | 7 | | 2025-08-20 | 104 | 13 | click | 1 | +------------+---------+--------+------------+-------------+ Question: Write a single SQL query that produces the required output, handling all constraints above. Part B — Python (Pandas + core Python): 1) Given a DataFrame df with columns [pin_id:int, category_id:int, time_spent_seconds:float, pin_format:str] and a dict category_map = {category_id:int -> category_name:str}, return the category_name with the highest average time_spent_seconds among rows where pin_format case-insensitively equals 'video'. Break ties by lexicographically smallest category_name. Exclude null/NaN time_spent_seconds. Aim for vectorized Pandas (no Python loops over rows). 2) Given a dict user_to_pins = {'user1':[1,2,2], 'user2':[1,2,3], ...}, compute the average number of unique pin_ids per user (i.e., mean over users of |set(pin_ids)|), returning a float. Handle empty lists, extremely large lists, and memory efficiency for millions of users. Provide time/space complexity and code.

Quick Answer: This question evaluates SQL and Python data-manipulation competencies, covering date-based user filtering, joins and deduplication across events, users and pin metadata, aggregated CTR calculation with division-by-zero handling, and vectorized Pandas grouping, NaN handling, case-insensitive filtering, and tie-breaking.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Write monthly touches and last-touch SQL - Upstart (Medium)
  • Manipulate data in R with dplyr joins and windows - Upstart (Medium)
  • Calculate Average Event Value by User ID - Upstart (Medium)
  • Calculate User Revenue and Session Duration in Python - Upstart (Medium)