Design city home-price prediction system

Q: Design city home-price prediction system

This question evaluates proficiency in end-to-end machine learning system design for geospatial property price prediction, covering skills such as feature engineering, time- and spatial-aware validation, model selection, interpretability, fairness/compliance, deployment, monitoring, and error analysis.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

End-to-End System Design: Predict Residential Property Sale Prices

Context

You are tasked with building a production-grade machine learning system to predict the sale price of residential properties in a large city. You have ~10 years of historical, geocoded sales with property attributes, plus external data (transit GTFS, schools, zoning, crime, permits, points of interest, macroeconomic indicators). The system must generalize across time and neighborhoods, avoid leakage, and be explainable and compliant.

Requirements

Features and Engineering

Enumerate key feature groups: geospatial, transit accessibility, school quality, neighborhood effects, time-of-sale, macro factors, property attributes, and environmental factors.
Specify feature engineering: e.g., distance/travel-time to POIs, spatial lags and neighborhood aggregates, encodings for high-cardinality categoricals.
Explain how you will handle high-cardinality categoricals (e.g., neighborhood, school, zip) without leakage.

Training and Validation Strategy

Propose a time-aware and spatially blocked cross-validation that avoids leakage and overly optimistic estimates.
State and justify evaluation metrics (e.g., RMSLE vs. RMSE vs. MAPE) and calibration checks.

Model Choices and Interpretability

Compare GBMs vs. Random Forests vs. linear models with interactions/GAMs, and propose a final approach.
Provide an interpretability plan (global and local), including how to communicate drivers of price.

Fairness and Compliance

Identify potential proxies for protected classes (e.g., redlining risks) and how you will mitigate and test for them.
Outline documentation and reviews needed for compliance.

Deployment, Monitoring, Ablations, and Error Analysis

Describe deployment architecture (batch vs. online), feature store, retraining cadence, and CI/CD.
List monitoring KPIs (performance, drift, calibration) and segment-based alerts.
Explain ablation studies and error analysis you would run to improve the model and build trust.

Design city home-price prediction system

Overview

End-to-End System Design: Predict Residential Property Sale Prices

Context

Requirements

Comments (0)