PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/DRW

Train LinearSVC to beat a hidden baseline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design and implement a reproducible LinearSVC-based machine learning pipeline, testing competencies in preprocessing, feature engineering, mixed-type (numeric, categorical, text) data handling, hyperparameter tuning, validation strategy, and prevention of data leakage.

  • hard
  • DRW
  • ML System Design
  • Machine Learning Engineer

Train LinearSVC to beat a hidden baseline

Company: DRW

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Take-home Project

Implement train(X_train, y_train) and test(X_test) for a specified LinearSVC classifier to outperform a provided baseline accuracy. Constraints: the model class is fixed (LinearSVC); you may modify preprocessing, feature engineering, and training hyperparameters only. Propose and justify data-centric improvements (e.g., standardization/normalization, tokenization and TF–IDF or hashing for text, dimensionality reduction, outlier handling, class imbalance strategies, feature crosses, data cleaning). Explain how you will tune without access to test accuracy once the baseline is exceeded, using a robust validation strategy (k-fold CV, nested CV, or a holdout set) while preventing data leakage. Provide reproducible code, an experiment log, and a plan for measuring generalization.

Quick Answer: This question evaluates a candidate's ability to design and implement a reproducible LinearSVC-based machine learning pipeline, testing competencies in preprocessing, feature engineering, mixed-type (numeric, categorical, text) data handling, hyperparameter tuning, validation strategy, and prevention of data leakage.

Related Interview Questions

  • Build pipeline for imbalanced classification - DRW (medium)
  • Train LinearSVC to beat baseline accuracy - DRW (medium)
DRW logo
DRW
Jul 29, 2025, 12:00 AM
Machine Learning Engineer
Take-home Project
ML System Design
0
0

Take‑Home: Build a strong LinearSVC pipeline that beats a baseline and generalizes

Problem

You are given training features X_train and labels y_train and must implement:

  • train(X_train, y_train): trains a LinearSVC-based pipeline that beats a provided baseline accuracy using only preprocessing, feature engineering, and hyperparameters. It must be reproducible and robust to mixed data types (numeric, categorical, text).
  • test(X_test): loads the trained artifact and produces predictions for X_test without any data leakage.

Constraints:

  • The model class is fixed: LinearSVC. You may not change the final classifier family.
  • You may change preprocessing and feature engineering (e.g., standardization/normalization, tokenization + TF–IDF or hashing for text, dimensionality reduction, outlier handling, class imbalance strategies, feature crosses, data cleaning).
  • You must explain and implement a validation strategy that tunes hyperparameters without using test accuracy after the baseline is exceeded (e.g., k‑fold CV, nested CV, or a holdout set) while preventing data leakage.

Deliverables:

  1. Reproducible code for train(X_train, y_train) and test(X_test) using LinearSVC, with robust preprocessing for typical tabular/text tasks.
  2. Justification of data‑centric improvements you applied.
  3. A tuning plan that does not consult test accuracy once baseline is exceeded, with a clear guard against leakage.
  4. An experiment log that records baseline, CV scores, chosen hyperparameters, and final training details.
  5. A plan for measuring out‑of‑sample generalization.

Assumptions (for completeness):

  • X_train/X_test are pandas DataFrames with a mix of numeric, categorical (string/low‑cardinality), and possibly text columns (free‑form strings). y_train is a 1D array-like of class labels (binary or multi‑class). If your dataset is purely tabular or purely text, the pipeline adapts by auto‑detecting column types.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More DRW•More Machine Learning Engineer•DRW Machine Learning Engineer•DRW ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.