Train LinearSVC to beat a hidden baseline
Company: DRW
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Take-home Project
Implement train(X_train, y_train) and test(X_test) for a specified LinearSVC classifier to outperform a provided baseline accuracy. Constraints: the model class is fixed (LinearSVC); you may modify preprocessing, feature engineering, and training hyperparameters only. Propose and justify data-centric improvements (e.g., standardization/normalization, tokenization and TF–IDF or hashing for text, dimensionality reduction, outlier handling, class imbalance strategies, feature crosses, data cleaning). Explain how you will tune without access to test accuracy once the baseline is exceeded, using a robust validation strategy (k-fold CV, nested CV, or a holdout set) while preventing data leakage. Provide reproducible code, an experiment log, and a plan for measuring generalization.
Quick Answer: This question evaluates a candidate's ability to design and implement a reproducible LinearSVC-based machine learning pipeline, testing competencies in preprocessing, feature engineering, mixed-type (numeric, categorical, text) data handling, hyperparameter tuning, validation strategy, and prevention of data leakage.