PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Coinbase

Build and evaluate a conversion prediction model

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in predictive modeling, feature engineering with target-leakage control, time-aware validation, uplift and treatment-effect estimation, calibration and uncertainty quantification, and end-to-end deployment considerations within the Machine Learning domain.

  • hard
  • Coinbase
  • Machine Learning
  • Data Scientist

Build and evaluate a conversion prediction model

Company: Coinbase

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You receive a CSV of email exposures with one row per user-email send: cols = user_id, send_ts, treatment_flag, opened, clicked, purchased_within_7d (label), user_region, device_type, tenure_days, prior_sessions_28d, prior_purchases_180d, avg_cart_value_180d, categories_viewed_28d, email_personalization_score, deliverability_score, and other anonymized features. Task: 1) EDA: Identify leakage risks and fix them by ensuring features are computable at send_ts (e.g., avoid using post-send behaviors like opened/clicked unless modeling an uplift chain). Explore class imbalance, missingness, outliers, and high-cardinality categoricals. Propose data quality checks. 2) Modeling: Build a baseline logistic regression and a gradient-boosted tree to predict purchased_within_7d. Use time-based splits: train=2025-06, valid=2025-07, test=2025-08. Use cross-validation on train, tune hyperparameters, and implement monotonicity or regularization as appropriate. 3) Evaluation: Report ROC AUC, PR AUC, calibration (reliability curve, Brier score), and incremental lift at the top 10% scored users versus control. Provide confidence intervals via bootstrap and assess stability by region/device. 4) Deployment: Choose a decision threshold to maximize expected incremental revenue given an email cost of $0.003 and an estimated treatment effect from a calibration experiment. Describe monitoring (data drift, performance drift, alerting), retraining cadence, and next steps to improve (feature engineering, causal uplift modeling, de-biasing via IPS or DR estimators).

Quick Answer: This question evaluates proficiency in predictive modeling, feature engineering with target-leakage control, time-aware validation, uplift and treatment-effect estimation, calibration and uncertainty quantification, and end-to-end deployment considerations within the Machine Learning domain.

Related Interview Questions

  • Explain precision/recall and compute NN output - Coinbase (hard)
  • Build a baseline classification model from messy data - Coinbase (medium)
  • How to Analyze and Model Behavioral Data Effectively? - Coinbase (hard)
Coinbase logo
Coinbase
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
4
0

Predicting 7-Day Purchase After Email Send

Context

You are given a CSV where each row is a user–email send (or scheduled send/control), with columns:

  • user_id, send_ts, treatment_flag (1=sent, 0=control/holdout)
  • opened, clicked, purchased_within_7d (label)
  • user_region, device_type, tenure_days
  • prior_sessions_28d, prior_purchases_180d, avg_cart_value_180d, categories_viewed_28d
  • email_personalization_score, deliverability_score
  • plus other anonymized features

Goal: Build a model that, at send time, predicts whether a user will purchase within 7 days of the email send, then decide whom to send to in order to maximize incremental revenue.

Assume purchased_within_7d is defined as a purchase occurring within [send_ts, send_ts + 7d). For control rows (treatment_flag=0), the window is anchored on the scheduled send_ts.

Tasks

  1. EDA and Leakage Control
  • Identify potential target leakage and ensure all features are computable at send_ts. Avoid using post-send behaviors (e.g., opened, clicked) unless modeling a multi-stage chain with predicted intermediates.
  • Explore class imbalance, missingness, outliers, and high-cardinality categoricals. Propose concrete data quality checks.
  1. Modeling
  • Train two models to predict purchased_within_7d at send time: a baseline logistic regression and a gradient-boosted tree.
  • Use time-based splits: train = 2025-06, valid = 2025-07, test = 2025-08.
  • Within the train month, use time-aware cross-validation and tune hyperparameters. Apply monotonic constraints and/or regularization where appropriate.
  1. Evaluation
  • Report ROC AUC, PR AUC, calibration (reliability curve, Brier score), and incremental lift at the top 10% scored users versus control.
  • Provide 95% confidence intervals via bootstrap and assess stability by user_region and device_type.
  1. Deployment
  • Choose a decision rule that maximizes expected incremental revenue given an email cost of $0.003 and an estimated treatment effect from a calibration experiment.
  • Describe monitoring (data drift, performance drift, alerting), retraining cadence, and next steps (feature engineering, causal uplift modeling, de-biasing via IPS/DR estimators).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Coinbase•More Data Scientist•Coinbase Data Scientist•Coinbase Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.