This question evaluates skills in practical machine learning engineering, including data cleaning, preprocessing, feature selection, handling mixed numeric and categorical features, and baseline model construction for a binary classification task.
In a live notebook (e.g., Jupyter), you are given a messy, real-world tabular dataset for a binary classification problem.
Data characteristics
y
∈ {0,1}
user_id
,
transaction_id
) and should not be used as predictive features
Task Within the session, produce a working end-to-end baseline that:
You may choose only a few features if that helps you deliver a robust, working solution quickly.