Design city home-price prediction system
Company: Citadel
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Design an end-to-end system to predict residential property sale prices for a large city. Specify: (1) Key features (geospatial, transit accessibility, school quality, neighborhood effects, time-of-sale, macro factors), feature engineering (e.g., distance-to-POIs, spatial lags), and handling of categorical high cardinality. (2) Train/validation strategy that avoids leakage—use time-aware and spatially blocked CV; justify metrics (e.g., RMSLE). (3) Model choices (GBMs vs. RF vs. linear with interactions) and interpretability plan. (4) Fairness and compliance checks (e.g., redlining proxies). (5) Deployment, monitoring, and how you would run ablations and error analysis.
Quick Answer: This question evaluates proficiency in end-to-end machine learning system design for geospatial property price prediction, covering skills such as feature engineering, time- and spatial-aware validation, model selection, interpretability, fairness/compliance, deployment, monitoring, and error analysis.