PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Coinbase

Build a baseline classification model from messy data

Last updated: Mar 29, 2026

Quick Overview

This question evaluates skills in practical machine learning engineering, including data cleaning, preprocessing, feature selection, handling mixed numeric and categorical features, and baseline model construction for a binary classification task.

  • medium
  • Coinbase
  • Machine Learning
  • Machine Learning Engineer

Build a baseline classification model from messy data

Company: Coinbase

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

In a live notebook (e.g., Jupyter), you are given a messy, real-world tabular dataset for a **binary classification** problem. **Data characteristics** - Target label: `y` ∈ {0,1} - Mix of numeric and categorical features - Missing values, inconsistent strings (e.g., "NA", empty), and possible outliers - Some columns may be identifiers (e.g., `user_id`, `transaction_id`) and should not be used as predictive features - Dataset is “medium-sized” (fits in memory); you can train a simple model quickly **Task** Within the session, produce a working end-to-end baseline that: 1. Loads the data and performs minimal but correct cleaning. 2. Splits data into train/validation (and optionally test) without leakage. 3. Builds a simple model that can handle mixed feature types (or uses preprocessing to enable this). 4. Evaluates performance with an appropriate metric (e.g., ROC-AUC / PR-AUC / F1, depending on class imbalance). 5. Briefly explains your choices (feature selection, preprocessing, model choice, and how you’d improve it if given more time). You may choose only a few features if that helps you deliver a robust, working solution quickly.

Quick Answer: This question evaluates skills in practical machine learning engineering, including data cleaning, preprocessing, feature selection, handling mixed numeric and categorical features, and baseline model construction for a binary classification task.

Related Interview Questions

  • Explain precision/recall and compute NN output - Coinbase (hard)
  • Build and evaluate a conversion prediction model - Coinbase (hard)
  • How to Analyze and Model Behavioral Data Effectively? - Coinbase (hard)
Coinbase logo
Coinbase
Dec 4, 2025, 12:00 AM
Machine Learning Engineer
Onsite
Machine Learning
6
0

In a live notebook (e.g., Jupyter), you are given a messy, real-world tabular dataset for a binary classification problem.

Data characteristics

  • Target label: y ∈ {0,1}
  • Mix of numeric and categorical features
  • Missing values, inconsistent strings (e.g., "NA", empty), and possible outliers
  • Some columns may be identifiers (e.g., user_id , transaction_id ) and should not be used as predictive features
  • Dataset is “medium-sized” (fits in memory); you can train a simple model quickly

Task Within the session, produce a working end-to-end baseline that:

  1. Loads the data and performs minimal but correct cleaning.
  2. Splits data into train/validation (and optionally test) without leakage.
  3. Builds a simple model that can handle mixed feature types (or uses preprocessing to enable this).
  4. Evaluates performance with an appropriate metric (e.g., ROC-AUC / PR-AUC / F1, depending on class imbalance).
  5. Briefly explains your choices (feature selection, preprocessing, model choice, and how you’d improve it if given more time).

You may choose only a few features if that helps you deliver a robust, working solution quickly.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Coinbase•More Machine Learning Engineer•Coinbase Machine Learning Engineer•Coinbase Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.