PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Perplexity

Train and improve a scikit-learn binary classifier

Last updated: Mar 29, 2026

Quick Overview

Evaluates the ability to train, evaluate, and iteratively improve a scikit-learn binary classifier, encompassing model selection, preprocessing, validation practices, handling class imbalance, and interpretation of performance metrics.

  • medium
  • Perplexity
  • Machine Learning
  • Machine Learning Engineer

Train and improve a scikit-learn binary classifier

Company: Perplexity

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

## Practical ML fundamentals (Python + scikit-learn) You are given a small **toy binary-classification dataset** (e.g., arrays/dataframes `X_train, y_train, X_valid, y_valid` or a single dataset you must split). Your task is to: 1. **Train a baseline binary classifier** using **scikit-learn**. - Choose a reasonable model (e.g., logistic regression, linear SVM, random forest, gradient boosting). - Fit it on the training set. 2. **Evaluate the model** on the validation set using one or more evaluation metrics. - Common choices: accuracy, precision/recall, F1, ROC-AUC, PR-AUC, confusion matrix. 3. After you see the initial metric(s), **improve the evaluation metric(s)**. - You may change the model, tune hyperparameters, adjust preprocessing, address class imbalance, change decision thresholds, or revise the validation approach. ### Constraints / expectations - Use **Python** and **scikit-learn** APIs. - Keep the solution clean and reproducible (e.g., use `Pipeline`, set `random_state`, avoid data leakage). - Explain your choices and how each change is expected to affect the metric.

Quick Answer: Evaluates the ability to train, evaluate, and iteratively improve a scikit-learn binary classifier, encompassing model selection, preprocessing, validation practices, handling class imbalance, and interpretation of performance metrics.

Perplexity logo
Perplexity
Dec 15, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
Machine Learning
7
0
Loading...

Practical ML fundamentals (Python + scikit-learn)

You are given a small toy binary-classification dataset (e.g., arrays/dataframes X_train, y_train, X_valid, y_valid or a single dataset you must split). Your task is to:

  1. Train a baseline binary classifier using scikit-learn .
    • Choose a reasonable model (e.g., logistic regression, linear SVM, random forest, gradient boosting).
    • Fit it on the training set.
  2. Evaluate the model on the validation set using one or more evaluation metrics.
    • Common choices: accuracy, precision/recall, F1, ROC-AUC, PR-AUC, confusion matrix.
  3. After you see the initial metric(s), improve the evaluation metric(s) .
    • You may change the model, tune hyperparameters, adjust preprocessing, address class imbalance, change decision thresholds, or revise the validation approach.

Constraints / expectations

  • Use Python and scikit-learn APIs.
  • Keep the solution clean and reproducible (e.g., use Pipeline , set random_state , avoid data leakage).
  • Explain your choices and how each change is expected to affect the metric.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Perplexity•More Machine Learning Engineer•Perplexity Machine Learning Engineer•Perplexity Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.