PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/DRW

Train LinearSVC to beat baseline accuracy

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a machine learning engineer's competence in building and validating a production-style classification pipeline with scikit-learn's LinearSVC, including preprocessing, hyperparameter tuning, cross-validation, and safeguarding against data leakage.

  • medium
  • DRW
  • ML System Design
  • Machine Learning Engineer

Train LinearSVC to beat baseline accuracy

Company: DRW

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Take-home Project

##### Question Implement train() and test() functions that train a LinearSVC model so its hidden-test accuracy beats a supplied baseline; experiment with model and data adjustments, noting that data-side tweaks proved effective.

Quick Answer: This question evaluates a machine learning engineer's competence in building and validating a production-style classification pipeline with scikit-learn's LinearSVC, including preprocessing, hyperparameter tuning, cross-validation, and safeguarding against data leakage.

Related Interview Questions

  • Build pipeline for imbalanced classification - DRW (medium)
  • Train LinearSVC to beat a hidden baseline - DRW (hard)
DRW logo
DRW
Aug 4, 2025, 10:55 AM
Machine Learning Engineer
Take-home Project
ML System Design
4
0

Task: Train and Evaluate a LinearSVC to Beat a Baseline

Context

You are given a binary or multi-class classification dataset split into train and hidden test sets. An accuracy baseline (e.g., a previous model or a heuristic) is provided. Your goal is to implement train() and test() functions to:

  • Train a LinearSVC model that beats the supplied baseline on the hidden test set.
  • Use sound validation (e.g., cross-validation) without peeking at the hidden test.
  • Experiment with both model-level and data-level adjustments; document what you tried and which changes helped. Emphasize data-side tweaks where effective.

Assume input features may be numeric (dense or sparse) and/or text. You may implement a flexible preprocessing pipeline or parameterize your functions to handle either case.

Requirements

  1. Implement train() that:
    • Accepts training data (X_train, y_train) and a baseline_accuracy .
    • Builds a scikit-learn Pipeline with preprocessing and a LinearSVC classifier.
    • Tunes key hyperparameters via cross-validation (report CV scores and best params).
    • Returns a fitted pipeline and training diagnostics (e.g., CV accuracy).
  2. Implement test() that:
    • Takes the fitted pipeline and (X_test, y_test) .
    • Returns test accuracy (and a brief report) to confirm you beat the baseline.
  3. Experiments and reporting:
    • Try model-level tweaks (e.g., C , class_weight , feature selection) and data-side tweaks (e.g., scaling, TF–IDF, stop-word removal, deduplication).
    • Briefly summarize which changes improved accuracy. Note that data-side tweaks were especially effective.

Constraints

  • Classifier must be sklearn.svm.LinearSVC .
  • Use accuracy as the metric.
  • Avoid data leakage: fit all preprocessing only on training folds.
  • Use cross-validation on train; use the hidden test only once at the end.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More DRW•More Machine Learning Engineer•DRW Machine Learning Engineer•DRW ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.