PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates proficiency in SQL-based feature engineering for time-series and user-event data, covering snapshotting, temporal joins, deduplication, windowed aggregations, distinct counting, and label construction in the Data Manipulation (SQL/Python) domain.

  • Medium
  • Stripe
  • Data Manipulation (SQL/Python)
  • Data Scientist

Write SQL for snapshot features and labels

Company: Stripe

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

Use 'today' = 2025-09-01. Invented schema (UTC): users(u_id INT, signup_dt DATE, country STRING) events(u_id INT, event_time TIMESTAMP, arrival_time TIMESTAMP, event_type STRING, session_id STRING) orders(order_id INT, u_id INT, order_time TIMESTAMP, amount DECIMAL(10,2)) Sample rows: users u_id | signup_dt | country 1 | 2025-07-20 | US 2 | 2025-08-25 | CA 3 | 2025-08-30 | US events u_id | event_time | arrival_time | event_type | session_id 1 | 2025-08-29 10:00:00 | 2025-08-29 10:00:02 | view | a 1 | 2025-08-31 09:00:00 | 2025-08-31 09:00:03 | add_to_cart | a 1 | 2025-09-01 12:00:00 | 2025-09-01 12:00:01 | view | b 2 | 2025-08-30 15:00:00 | 2025-08-30 15:00:02 | view | c 2 | 2025-09-01 09:30:00 | 2025-09-01 09:30:01 | view | d 3 | 2025-08-31 23:59:00 | 2025-09-02 00:00:05 | view | e (late-arriving) 3 | 2025-09-01 08:00:00 | 2025-09-01 08:00:00 | view | e 3 | 2025-09-01 08:00:00 | 2025-09-01 08:00:00 | view | e (duplicate) orders order_id | u_id | order_time | amount 101 | 1 | 2025-09-10 13:00:00 | 120.00 102 | 2 | 2025-08-31 16:00:00 | 50.00 103 | 2 | 2025-09-03 11:00:00 | 60.00 Task: Write one SQL query that produces a training dataset at anchor snapshot_ts = '2025-09-01 00:00:00' with columns: (u_id, country, days_since_signup, last_event_time, events_7d, sessions_7d, added_to_cart_7d, y_30d). Rules: (A) Features use only events with event_time < snapshot_ts AND arrival_time <= snapshot_ts; drop duplicates by (u_id, event_time, event_type, session_id). (B) Compute 7-day windows over [snapshot_ts - 7 days, snapshot_ts). (C) sessions_7d counts distinct session_id in window; events_7d counts distinct (event_time, event_type, session_id). (D) y_30d = 1 if there exists an order in [snapshot_ts, snapshot_ts + INTERVAL '30' DAY), else 0. (E) Include users with no events (fill nulls/zeros appropriately). Show the exact row outputs for the provided sample data and explain why user 3's late-arriving 2025-08-31 event must be excluded from features.

Quick Answer: This question evaluates proficiency in SQL-based feature engineering for time-series and user-event data, covering snapshotting, temporal joins, deduplication, windowed aggregations, distinct counting, and label construction in the Data Manipulation (SQL/Python) domain.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Design an idempotent SQL ETL for late data - Stripe (Medium)
  • Write SQL to monitor weekly chargeback spikes - Stripe (Medium)
  • Write SQL to detect recurring non-subscription users - Stripe (Medium)
  • Design metrics and write SQL for a case - Stripe (Medium)
  • Design payment-to-invoice matcher with priorities - Stripe (Medium)