This question evaluates machine learning and data science competencies including regression model design, feature engineering, data-splitting and leakage prevention, selection of evaluation metrics, handling missing values, outliers and high-cardinality location features, temporal drift management, and model interpretation for house-price prediction. It is commonly asked in the Machine Learning domain to assess end-to-end practical application and conceptual understanding of validation and metric trade-offs, primarily testing practical application supported by conceptual reasoning and stakeholder communication.
You are asked to build a model to predict house sale prices for a city of your choice.
You have a historical dataset of home listings/sales with (examples):
sale_id
(string, unique)
city
(string)
sale_date
(date)
sale_price
(float, target)
bedrooms
(int),
bathrooms
(float),
sqft
(float),
lot_sqft
(float)
year_built
(int)
zipcode
/
neighborhood
(string)
lat
/
lon
(float)
property_type
(categorical)
days_on_market
(int)
Login required