PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Hudson

Build a Heart Disease Baseline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in applied machine learning workflows, including data inspection and cleaning, exploratory data analysis, feature engineering, baseline binary classification modeling, and evaluation using Python libraries like pandas and seaborn.

  • hard
  • Hudson
  • Machine Learning
  • Software Engineer

Build a Heart Disease Baseline

Company: Hudson

Role: Software Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You are given a tabular dataset for predicting whether a patient has heart disease. The dataset contains a binary target column such as `has_heart_disease` and several features, for example age, height, weight, blood pressure, cholesterol, smoking status, and other clinical measurements. Using Python, `pandas`, and `seaborn`, walk through how you would: 1. Load and inspect the data. 2. Clean missing values, duplicates, and obviously invalid records. 3. Perform exploratory data analysis and visualize relationships between features and the target. 4. Engineer useful features when appropriate, such as BMI from height and weight. 5. Train a reasonable baseline model for binary classification. 6. Evaluate the model and explain which metrics you would report. 7. Summarize what patterns you found and what you would check before using the model in practice. You may assume standard Python ML libraries are available.

Quick Answer: This question evaluates proficiency in applied machine learning workflows, including data inspection and cleaning, exploratory data analysis, feature engineering, baseline binary classification modeling, and evaluation using Python libraries like pandas and seaborn.

Related Interview Questions

  • Derive expected inversions and mean distribution - Hudson (medium)
Hudson logo
Hudson
Mar 13, 2026, 12:00 AM
Software Engineer
Onsite
Machine Learning
3
0

You are given a tabular dataset for predicting whether a patient has heart disease. The dataset contains a binary target column such as has_heart_disease and several features, for example age, height, weight, blood pressure, cholesterol, smoking status, and other clinical measurements.

Using Python, pandas, and seaborn, walk through how you would:

  1. Load and inspect the data.
  2. Clean missing values, duplicates, and obviously invalid records.
  3. Perform exploratory data analysis and visualize relationships between features and the target.
  4. Engineer useful features when appropriate, such as BMI from height and weight.
  5. Train a reasonable baseline model for binary classification.
  6. Evaluate the model and explain which metrics you would report.
  7. Summarize what patterns you found and what you would check before using the model in practice.

You may assume standard Python ML libraries are available.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Hudson•More Software Engineer•Hudson Software Engineer•Hudson Machine Learning•Software Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.