PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Snowflake

Design and validate a cost-sensitive classifier

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competence in cost-sensitive binary classification, handling delayed and imbalanced labels, calibration and decision-threshold selection, distributional-drift detection, monitoring, and low-latency deployment within the Machine Learning / Data Science domain.

  • hard
  • Snowflake
  • Machine Learning
  • Data Scientist

Design and validate a cost-sensitive classifier

Company: Snowflake

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You must ship a binary classifier that predicts whether a user will purchase within the next 7 days. Positives are rare (~3%). There is a label delay of up to 10 days; features include recent session stats, counts, and recency by category. Business values: TP yields +$2 expected margin, FP costs -$0.10 (annoyance/discount), FN costs -$0.50 (missed margin), TN yields $0. Requirements: A) Propose the end-to-end training/evaluation design that avoids leakage under delayed labels. Specify the exact time-based cross-validation scheme (fold boundaries, feature/label windows) and why it’s unbiased. B) Choose offline metrics and show how you would calibrate the model (e.g., Platt/Isotonic). Provide the formula to select the decision threshold that maximizes expected profit under the given costs; explain how you’d check threshold stability across cohorts. C) Handle distribution shift: outline drift detection on covariates and on calibration (e.g., PSI, ECE). Propose an online monitoring dashboard with guardrails. D) Latency and interpretability: with a 50 ms p95 budget and 64 MB RAM per request, describe a deployable modeling choice and featurization plan (including any precomputed features) that meets constraints, plus a fallback rule when the model is unavailable. E) Explain the model and decisions to a non-technical stakeholder and reconcile if they insist on a different threshold. What evidence would you present to align on the target operating point?

Quick Answer: This question evaluates competence in cost-sensitive binary classification, handling delayed and imbalanced labels, calibration and decision-threshold selection, distributional-drift detection, monitoring, and low-latency deployment within the Machine Learning / Data Science domain.

Snowflake logo
Snowflake
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
2
0

Binary Purchase Prediction with Delayed Labels and Imbalanced Classes

Context

  • Goal: Ship a real-time binary classifier that predicts whether a user will purchase within the next 7 days.
  • Class imbalance: Positives ≈ 3%.
  • Label delay: Up to 10 days after the 7-day window (i.e., labels mature up to 17 days after the prediction time).
  • Features: Recent session statistics, counts, and recency by category.
  • Business values (per user):
    • True Positive (TP): +$2 expected margin
    • False Positive (FP): −$0.10 (annoyance/discount cost)
    • False Negative (FN): −$0.50 (missed margin)
    • True Negative (TN): $0

Tasks A) Propose an end-to-end training and evaluation design that avoids leakage under delayed labels. Specify an exact time-based cross-validation scheme (fold boundaries, feature and label windows) and explain why it’s unbiased.

B) Choose offline metrics and describe how to calibrate the model (e.g., Platt scaling or isotonic regression). Provide the formula for selecting the decision threshold that maximizes expected profit under the given costs, and explain how you would assess threshold stability across cohorts.

C) Handle distribution shift: outline drift detection on covariates and on calibration (e.g., PSI, ECE). Propose an online monitoring dashboard with guardrails.

D) Latency and interpretability: With a 50 ms p95 budget and 64 MB RAM per request, describe a deployable modeling choice and featurization plan (including any precomputed features) that meets constraints, plus a fallback rule when the model is unavailable.

E) Explain the model and threshold decisions to a non-technical stakeholder and reconcile if they insist on a different threshold. What evidence would you present to align on the target operating point?

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Snowflake•More Data Scientist•Snowflake Data Scientist•Snowflake Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.