PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers

Quick Overview

This question evaluates SQL-based data manipulation and Python set/co-occurrence computation skills, covering aggregation, joins, timestamp-based cohorting and windowing, freshness detection, CTR calculation, and similarity scoring within a data scientist context.

  • medium
  • Pinterest
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write Queries for Pinterest Engagement Tasks

Company: Pinterest

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: medium

Interview Round: Technical Screen

You are given several data-manipulation tasks based on Pinterest-style product data. Use UTC for all timestamp-to-date conversions unless stated otherwise. Treat time windows as inclusive at the start and exclusive at the end. Task 1: SQL CTR by pin format for new US users Tables: `events` - `event_id BIGINT`: unique event identifier. - `user_id BIGINT`: user who took the action. - `pin_id BIGINT`: pin involved in the action. - `event_timestamp TIMESTAMP`: time of the action in UTC. - `event_type VARCHAR`: either `impression` or `click`. `users_d` - `user_id BIGINT`: unique user identifier. - `signup_timestamp TIMESTAMP`: user signup time in UTC. - `country_code VARCHAR`: ISO country code, for example `US`. `pins_classification` - `pin_id BIGINT`: unique pin identifier. - `pin_format VARCHAR`: pin format, for example `image`, `video`, or `idea`. - `category_id BIGINT`: optional content category identifier. Relationships: - `events.user_id = users_d.user_id`. - `events.pin_id = pins_classification.pin_id`. Definition: - Click-through rate, or CTR, is `number of clicks / number of impressions`. - A user is considered new for an event if the user signed up before or at the event timestamp and less than 30 days before the event timestamp. - Only include users with `country_code = US`. Write a SQL query that returns one row per `event_date` and `pin_format` with the following columns: - `event_date DATE` - `pin_format VARCHAR` - `impressions BIGINT` - `clicks BIGINT` - `ctr DOUBLE` Task 2: SQL percentage of users who saw fresh content Tables: `pins` - `pin_id BIGINT`: unique pin identifier. - `created_timestamp TIMESTAMP`: pin creation time in UTC. `pin_impression_events` - `user_id BIGINT`: user who saw the pin. - `pin_id BIGINT`: pin that was impressed. - `impression_timestamp TIMESTAMP`: time of the impression in UTC. - `impression_type VARCHAR`: impression surface or type, such as `home_feed`, `search`, or `related_pins`. - `impression_count BIGINT`: number of impressions represented by the row. Definitions: - A pin is fresh if it receives at least 2 total impressions within 7 days of `created_timestamp`, using the window `[created_timestamp, created_timestamp + interval 7 days)`. - A user saw fresh content if the user had at least one impression of a fresh pin during the reporting window. - The denominator is distinct users with at least one impression during the reporting window. - Use parameters `:start_timestamp` and `:end_timestamp` for the reporting window. Write a SQL query that returns: - `total_users BIGINT` - `users_who_saw_fresh_content BIGINT` - `pct_users_who_saw_fresh_content DOUBLE` Task 3: Python similar pins by board co-occurrence You are given a dictionary `boards` mapping each board id to the unique pins saved on that board. For a given pin, compute the similarity score between the given pin and every other pin. The similarity score is the number of boards on which the other pin appears together with the given pin. Example input: `boards = {b1: [a, c, d, e], b2: [a, c, e, f], b3: [a, c, g, i]}` and `given_pin = a`. Expected scores: - `c = 3` - `e = 2` - `d = 1` - `f = 1` - `g = 1` - `i = 1` Do not output pins with similarity score 0. Return pins with their scores, sorted by descending score and then by ascending pin id. If an optional argument `n` is provided, return only the top `n` pins. Task 4: Python or pandas category with highest average time spent for video pins You are given a pandas DataFrame `engagement` with columns: - `user_id` - `pin_id` - `pin_type`: for example `image` or `video`. - `category_id` - `time_spent_seconds` You are also given a dictionary `category_mapping` from `category_id` to a human-readable category name. For rows whose `category_id` is missing from the dictionary or null, map the category to `unknown`. Find the category with the highest average `time_spent_seconds` among video pins only. Return the category name, average time spent, and number of video-pin rows used for that category. Specify how you would handle ties.

Quick Answer: This question evaluates SQL-based data manipulation and Python set/co-occurrence computation skills, covering aggregation, joins, timestamp-based cohorting and windowing, freshness detection, CTR calculation, and similarity scoring within a data scientist context.

Last updated: May 7, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Write SQL for top categories and highly active users - Pinterest (easy)
  • Compute percent of first-cancelled users who never rebook - Pinterest (medium)
  • Compute percent of first-cancel users who never return - Pinterest (easy)
  • Find top category by video time spent - Pinterest (Medium)
  • Write SQL to compute max-overlap lists - Pinterest (Medium)