PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

Databricks data-scientist technical-screen question: given a transactions table, compute for each calendar week the number of distinct customers whose year-to-date spend reaches at least $1000, resetting YTD each January and supporting multiple years. It tests running-total window functions, per-customer-then-threshold logic, Monday-week bucketing, and distinct counts in SQL or pandas.

  • easy
  • Databricks
  • Data Manipulation (SQL/Python)
  • Data Scientist

Count weekly customers with ≥$1000 YTD spend

Company: Databricks

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: easy

Interview Round: Technical Screen

##### Question You are given a transaction-level table and must compute a weekly time series of how many customers have reached a year-to-date spending threshold. The problem can be solved in **SQL** or **Python (pandas)**. ## Table **transactions** - `date` (DATE) — transaction date (assume UTC, calendar dates) - `transaction_id` (STRING/INT) — primary key - `customer_id` (STRING/INT) - `dollars` (NUMERIC) — non-negative spend amount in USD ## Definitions - **Week**: a calendar week starting on Monday. Report each week by its `week_start_date` (the Monday of the week). You may pick a different, consistent week boundary if you state it. - **Year-to-date (YTD) spend as of a week**: for a given `customer_id` in calendar year `Y`, the sum of `dollars` from **Jan 1 of year Y** up to and including the **last day of that week**. ## Task For each week present in the data, compute the **number of distinct customers** whose **YTD spend as of that week** is **at least $1000**. 1. **Core metric.** For every week, count the distinct customers whose cumulative spend from the start of that calendar year through the end of the week is ≥ $1000. 2. **Multi-year handling.** If the data spans multiple calendar years, compute the metric **independently per year** — YTD resets to $0 on Jan 1 of each year. Include the `year` in the output. 3. **Ordering.** Order the results by `year`, then `week_start_date`. ## Output Return a table with: - `year` (INT) - `week_start_date` (DATE) - `num_customers_ge_1000_ytd` (INT) (If you assume a single year, the original minimal output of `week_start` and `num_customers_ge_1000_ytd` is also acceptable — the `year` column is what generalizes it to multiple years.)

Quick Answer: Databricks data-scientist technical-screen question: given a transactions table, compute for each calendar week the number of distinct customers whose year-to-date spend reaches at least $1000, resetting YTD each January and supporting multiple years. It tests running-total window functions, per-customer-then-threshold logic, Monday-week bucketing, and distinct counts in SQL or pandas.

Last updated: Jun 15, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find Top-5 Similar Rows - Databricks (hard)
  • Find top-5 most similar rows across datasets - Databricks (easy)
  • Calculate Second-Degree Followers for Each YouTuber - Databricks (Medium)