PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Stripe

Design a model for subscription adoption prediction

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design and productionize a supervised machine-learning pipeline for predicting subscription adoption, testing competencies in labeling strategy, leakage identification and prevention, feature engineering from transactional and merchant metadata, model selection and calibration, evaluation with time-based validation, and post-deployment monitoring. It is commonly asked in the Machine Learning domain for Data Scientist roles because it assesses practical application and production-ready design along with conceptual understanding of time-based splits, class imbalance, performance metrics, and drift detection.

  • hard
  • Stripe
  • Machine Learning
  • Data Scientist

Design a model for subscription adoption prediction

Company: Stripe

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Take-home Project

You must predict which non-Subscription merchants will adopt Subscription within the next 60 days. Using only the data available up to 2025-07-03 to predict adoption by 2025-09-01, design a production-ready classification approach. Answer concisely: (a) Labeling: precisely define positives/negatives and the observation and outcome windows; handle merchants already using Subscription and cold-start merchants. (b) Leakage: list at least five concrete leakage risks specific to this data (e.g., using product_type after the label cutoff, refund rates influenced by post-adoption behavior) and how you’d prevent them via time-based feature windows and proper splits. (c) Features: propose 15–25 high-signal, computable features from transactions (recency/frequency/monetary, 28–35 day repeat patterns, customer concentration, card_fingerprint diversity, weekend share, chargeback/refund rates, growth rates) and from merchant metadata (age, vertical, geo). (d) Modeling: choose two models (e.g., regularized logistic vs. gradient-boosted trees); discuss class imbalance handling (weights vs. downsampling), calibration, and interpretability for a sales handoff. (e) Evaluation: specify time-based cross-validation, primary metrics (PR-AUC, precision@K, recall@K), and how you would select a threshold to deliver a list of 1,000 merchants with expected precision ≥ 0.60. (f) Monitoring: define post-deployment drift and performance checks (data drift on feature distributions, label drift, calibration drift) and how you’d retrain without contaminating future labels.

Quick Answer: This question evaluates a candidate's ability to design and productionize a supervised machine-learning pipeline for predicting subscription adoption, testing competencies in labeling strategy, leakage identification and prevention, feature engineering from transactional and merchant metadata, model selection and calibration, evaluation with time-based validation, and post-deployment monitoring. It is commonly asked in the Machine Learning domain for Data Scientist roles because it assesses practical application and production-ready design along with conceptual understanding of time-based splits, class imbalance, performance metrics, and drift detection.

Related Interview Questions

  • Normalize targets for multitask regression - Stripe (medium)
  • Design a hierarchical forecast for transactions - Stripe (Medium)
  • Design a target‑user prediction system - Stripe (hard)
  • Design a leak-free time-split model - Stripe (hard)
Stripe logo
Stripe
Oct 13, 2025, 9:49 PM
Data Scientist
Take-home Project
Machine Learning
0
0

Predicting 60-Day Adoption of Subscription by Non-Subscription Merchants

Context

You need to predict which merchants who are not currently using the Subscription product will adopt it within the next 60 days. For the live run, only data available up to 2025-07-03 may be used to predict adoption by 2025-09-01.

Assume you have: transaction/event logs (charges, refunds, disputes, payouts), merchant metadata (signup date, vertical, country), and identifiers like customer_id and card_fingerprint. Assume an event that uniquely indicates Subscription adoption (e.g., first Subscription API event or first Subscription invoice) is available with a timestamp.

Task

Design a production-ready classification approach and answer concisely:

(a) Labeling: Precisely define positives/negatives and the observation and outcome windows; handle merchants already using Subscription and cold-start merchants.

(b) Leakage: List at least five concrete leakage risks specific to this data and how to prevent them via time-based feature windows and proper splits.

(c) Features: Propose 15–25 high-signal, computable features from transactions (recency/frequency/monetary, 28–35 day repeat patterns, customer concentration, card_fingerprint diversity, weekend share, chargeback/refund rates, growth rates) and from merchant metadata (age, vertical, geo).

(d) Modeling: Choose two models (e.g., regularized logistic vs. gradient-boosted trees); discuss class imbalance handling (weights vs. downsampling), calibration, and interpretability for a sales handoff.

(e) Evaluation: Specify time-based cross-validation, primary metrics (PR-AUC, precision@K, recall@K), and how you would select a threshold to deliver a list of 1,000 merchants with expected precision ≥ 0.60.

(f) Monitoring: Define post-deployment drift and performance checks (data drift on feature distributions, label drift, calibration drift) and how to retrain without contaminating future labels.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Stripe•More Data Scientist•Stripe Data Scientist•Stripe Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.