This question evaluates a candidate's competency in building an end-to-end regularized regression pipeline in scikit-learn, covering feature engineering and polynomial expansion, categorical encoding, target‑transform diagnostics, penalized linear model selection and tuning (Ridge/Lasso/ElasticNet), cross‑validation strategies including TimeSeriesSplit, leakage prevention, nested evaluation, feature effect ranking, and model persistence. Commonly asked in Machine Learning interviews for Data Scientist roles, it tests practical application skills in the Machine Learning domain and requires both hands-on implementation proficiency and conceptual understanding of validation design, target transformations, and interpretability, with emphasis on practical application augmented by conceptual judgment.
You are given a cleaned tabular dataset with marketing and product metrics. Your goal is to predict daily signups using a robust, leakage‑free modeling workflow in scikit‑learn.
Assume the dataset is a pandas DataFrame with at least these columns:
Build a model to predict signups from spend, clicks, cpc, region, and a time trend extracted from date. Implement the following in scikit‑learn:
Login required