PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Squarespace

Predict Seller Intent From Subscription Data

Last updated: May 20, 2026

Quick Overview

This question evaluates a data scientist's competency in defining target labels from temporal data, handling timestamp edge cases and missingness, engineering features from categorical, free-text and user-agent fields, and building and interpreting an interpretable binary classification model.

  • medium
  • Squarespace
  • Machine Learning
  • Data Scientist

Predict Seller Intent From Subscription Data

Company: Squarespace

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

You are given a take-home dataset, `seller_intent_take_home_dataset.csv`, containing about 5,000 new subscription records from a website-building platform. The business goal is to understand which factors are associated with a new user becoming an active seller and to build an interpretable prediction model for seller intent. The dataset has the following columns: | Column | Type | Description | |---|---:|---| | `subscription_id` | string | Unique subscription identifier. | | `subscription_start` | timestamp | When the subscription officially started. | | `first_sale` | timestamp, nullable | When the user made their first sale, if any. | | `days_in_trial` | integer | Number of days the user spent in trial. | | `subscription_plan` | categorical | Subscription plan selected. | | `subscription_period` | categorical | Billing period, such as monthly or annual. | | `discount_amount` | numeric | Discount amount applied to the subscription. | | `country` | categorical | User country. | | `site_topic` | categorical/free text, nullable | User-provided site topic from onboarding. | | `site_need` | categorical/free text, nullable | User-provided reason or need for creating the site. | | `user_agent` | string | Raw browser user-agent string. | Important data issues: - There is no existing `is_seller` label. You must define the target variable using `first_sale` and `subscription_start`. - Some rows have inconsistent timestamps, including cases where `first_sale` occurs before `subscription_start`. - `site_topic` and `site_need` have substantial missingness, roughly 28% to 32%, and many long-tail categories. - `user_agent` is a raw string, so useful features such as device type, operating system, and browser must be extracted. - The final notebook or report must be concise enough to present clearly and should prioritize data reasoning, feature engineering, and interpretability over complex hyperparameter tuning. Deliverables: 1. Define the target variable and explicitly state how you handle timestamp edge cases. 2. Perform exploratory data analysis focused on data quality, missingness, class balance, and feature distributions. 3. Engineer useful features from categorical fields, timestamps, discounts, trial length, and `user_agent`. 4. Train an interpretable model, such as logistic regression, and evaluate it using appropriate classification and ranking metrics. 5. Explain the main drivers of predicted seller intent, while distinguishing predictive associations from causal claims. 6. Prepare a short presentation explaining your assumptions, modeling choices, results, limitations, and recommended next steps.

Quick Answer: This question evaluates a data scientist's competency in defining target labels from temporal data, handling timestamp edge cases and missingness, engineering features from categorical, free-text and user-agent fields, and building and interpreting an interpretable binary classification model.

Squarespace logo
Squarespace
Oct 1, 2024, 12:00 AM
Data Scientist
Onsite
Machine Learning
1
0

You are given a take-home dataset, seller_intent_take_home_dataset.csv, containing about 5,000 new subscription records from a website-building platform. The business goal is to understand which factors are associated with a new user becoming an active seller and to build an interpretable prediction model for seller intent.

The dataset has the following columns:

ColumnTypeDescription
subscription_idstringUnique subscription identifier.
subscription_starttimestampWhen the subscription officially started.
first_saletimestamp, nullableWhen the user made their first sale, if any.
days_in_trialintegerNumber of days the user spent in trial.
subscription_plancategoricalSubscription plan selected.
subscription_periodcategoricalBilling period, such as monthly or annual.
discount_amountnumericDiscount amount applied to the subscription.
countrycategoricalUser country.
site_topiccategorical/free text, nullableUser-provided site topic from onboarding.
site_needcategorical/free text, nullableUser-provided reason or need for creating the site.
user_agentstringRaw browser user-agent string.

Important data issues:

  • There is no existing is_seller label. You must define the target variable using first_sale and subscription_start .
  • Some rows have inconsistent timestamps, including cases where first_sale occurs before subscription_start .
  • site_topic and site_need have substantial missingness, roughly 28% to 32%, and many long-tail categories.
  • user_agent is a raw string, so useful features such as device type, operating system, and browser must be extracted.
  • The final notebook or report must be concise enough to present clearly and should prioritize data reasoning, feature engineering, and interpretability over complex hyperparameter tuning.

Deliverables:

  1. Define the target variable and explicitly state how you handle timestamp edge cases.
  2. Perform exploratory data analysis focused on data quality, missingness, class balance, and feature distributions.
  3. Engineer useful features from categorical fields, timestamps, discounts, trial length, and user_agent .
  4. Train an interpretable model, such as logistic regression, and evaluate it using appropriate classification and ranking metrics.
  5. Explain the main drivers of predicted seller intent, while distinguishing predictive associations from causal claims.
  6. Prepare a short presentation explaining your assumptions, modeling choices, results, limitations, and recommended next steps.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Squarespace•More Data Scientist•Squarespace Data Scientist•Squarespace Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.