You are given a tabular dataset for predicting whether a patient has heart disease. The dataset contains a binary target column such as has_heart_disease and several features, for example age, height, weight, blood pressure, cholesterol, smoking status, and other clinical measurements.
Using Python, pandas, and seaborn, walk through how you would:
-
Load and inspect the data.
-
Clean missing values, duplicates, and obviously invalid records.
-
Perform exploratory data analysis and visualize relationships between features and the target.
-
Engineer useful features when appropriate, such as BMI from height and weight.
-
Train a reasonable baseline model for binary classification.
-
Evaluate the model and explain which metrics you would report.
-
Summarize what patterns you found and what you would check before using the model in practice.
You may assume standard Python ML libraries are available.