PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Coinbase

Build and evaluate a conversion prediction model

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in predictive modeling, feature engineering with target-leakage control, time-aware validation, uplift and treatment-effect estimation, calibration and uncertainty quantification, and end-to-end deployment considerations within the Machine Learning domain.

  • hard
  • Coinbase
  • Machine Learning
  • Data Scientist

Build and evaluate a conversion prediction model

Company: Coinbase

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You receive a CSV of email exposures with one row per user-email send: cols = user_id, send_ts, treatment_flag, opened, clicked, purchased_within_7d (label), user_region, device_type, tenure_days, prior_sessions_28d, prior_purchases_180d, avg_cart_value_180d, categories_viewed_28d, email_personalization_score, deliverability_score, and other anonymized features. Task: 1) EDA: Identify leakage risks and fix them by ensuring features are computable at send_ts (e.g., avoid using post-send behaviors like opened/clicked unless modeling an uplift chain). Explore class imbalance, missingness, outliers, and high-cardinality categoricals. Propose data quality checks. 2) Modeling: Build a baseline logistic regression and a gradient-boosted tree to predict purchased_within_7d. Use time-based splits: train=2025-06, valid=2025-07, test=2025-08. Use cross-validation on train, tune hyperparameters, and implement monotonicity or regularization as appropriate. 3) Evaluation: Report ROC AUC, PR AUC, calibration (reliability curve, Brier score), and incremental lift at the top 10% scored users versus control. Provide confidence intervals via bootstrap and assess stability by region/device. 4) Deployment: Choose a decision threshold to maximize expected incremental revenue given an email cost of $0.003 and an estimated treatment effect from a calibration experiment. Describe monitoring (data drift, performance drift, alerting), retraining cadence, and next steps to improve (feature engineering, causal uplift modeling, de-biasing via IPS or DR estimators).

Quick Answer: This question evaluates proficiency in predictive modeling, feature engineering with target-leakage control, time-aware validation, uplift and treatment-effect estimation, calibration and uncertainty quantification, and end-to-end deployment considerations within the Machine Learning domain.

Related Interview Questions

  • Explain precision/recall and compute NN output - Coinbase (hard)
  • Build a baseline classification model from messy data - Coinbase (medium)
  • How to Analyze and Model Behavioral Data Effectively? - Coinbase (hard)
|Home/Machine Learning/Coinbase

Build and evaluate a conversion prediction model

Coinbase logo
Coinbase
Oct 13, 2025, 9:49 PM
hardData ScientistOnsiteMachine Learning
4
0

Predicting 7-Day Purchase After Email Send

Context

You are given a CSV where each row is a user–email send (or scheduled send/control), with columns:

  • user_id, send_ts, treatment_flag (1=sent, 0=control/holdout)
  • opened, clicked, purchased_within_7d (label)
  • user_region, device_type, tenure_days
  • prior_sessions_28d, prior_purchases_180d, avg_cart_value_180d, categories_viewed_28d
  • email_personalization_score, deliverability_score
  • plus other anonymized features

Goal: Build a model that, at send time, predicts whether a user will purchase within 7 days of the email send, then decide whom to send to in order to maximize incremental revenue.

Assume purchased_within_7d is defined as a purchase occurring within [send_ts, send_ts + 7d). For control rows (treatment_flag=0), the window is anchored on the scheduled send_ts.

Tasks

  1. EDA and Leakage Control
  • Identify potential target leakage and ensure all features are computable at send_ts. Avoid using post-send behaviors (e.g., opened, clicked) unless modeling a multi-stage chain with predicted intermediates.
  • Explore class imbalance, missingness, outliers, and high-cardinality categoricals. Propose concrete data quality checks.
  1. Modeling
  • Train two models to predict purchased_within_7d at send time: a baseline logistic regression and a gradient-boosted tree.
  • Use time-based splits: train = 2025-06, valid = 2025-07, test = 2025-08.
  • Within the train month, use time-aware cross-validation and tune hyperparameters. Apply monotonic constraints and/or regularization where appropriate.
  1. Evaluation
  • Report ROC AUC, PR AUC, calibration (reliability curve, Brier score), and incremental lift at the top 10% scored users versus control.
  • Provide 95% confidence intervals via bootstrap and assess stability by user_region and device_type.
  1. Deployment
  • Choose a decision rule that maximizes expected incremental revenue given an email cost of $0.003 and an estimated treatment effect from a calibration experiment.
  • Describe monitoring (data drift, performance drift, alerting), retraining cadence, and next steps (feature engineering, causal uplift modeling, de-biasing via IPS/DR estimators).
Loading comments...

Browse More Questions

More Machine Learning•More Coinbase•More Data Scientist•Coinbase Data Scientist•Coinbase Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.