PracHub
QuestionsPremiumLearningGuidesInterview PrepCoaches
|Home/Machine Learning/Citadel

Design city home-price prediction system

Last updated: May 29, 2026

Quick Overview

This question evaluates proficiency in end-to-end machine learning system design for geospatial property price prediction, covering skills such as feature engineering, time- and spatial-aware validation, model selection, interpretability, fairness/compliance, deployment, monitoring, and error analysis.

  • hard
  • Citadel
  • Machine Learning
  • Data Scientist

Design city home-price prediction system

Company: Citadel

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Design an end-to-end system to predict residential property sale prices for a large city. Specify: (1) Key features (geospatial, transit accessibility, school quality, neighborhood effects, time-of-sale, macro factors), feature engineering (e.g., distance-to-POIs, spatial lags), and handling of categorical high cardinality. (2) Train/validation strategy that avoids leakage—use time-aware and spatially blocked CV; justify metrics (e.g., RMSLE). (3) Model choices (GBMs vs. RF vs. linear with interactions) and interpretability plan. (4) Fairness and compliance checks (e.g., redlining proxies). (5) Deployment, monitoring, and how you would run ablations and error analysis.

Quick Answer: This question evaluates proficiency in end-to-end machine learning system design for geospatial property price prediction, covering skills such as feature engineering, time- and spatial-aware validation, model selection, interpretability, fairness/compliance, deployment, monitoring, and error analysis.

Related Interview Questions

  • Analyze Correlations and Generate Gaussians - Citadel (medium)
  • Determine When a Quadratic Has Finite Minimum - Citadel (medium)
  • Choose models for trading tasks - Citadel (hard)
  • Estimate OLS via streaming sufficient statistics - Citadel (hard)
  • Diagnose outliers and influence in linear regression - Citadel (hard)
Citadel logo
Citadel
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
5
0

End-to-End System Design: Predict Residential Property Sale Prices

Context

You are tasked with building a production-grade machine learning system to predict the sale price of residential properties in a large city. You have ~10 years of historical, geocoded sales with property attributes, plus external data (transit GTFS, schools, zoning, crime, permits, points of interest, macroeconomic indicators). The system must generalize across time and neighborhoods, avoid leakage, and be explainable and compliant.

Requirements

  1. Features and Engineering
  • Enumerate key feature groups: geospatial, transit accessibility, school quality, neighborhood effects, time-of-sale, macro factors, property attributes, and environmental factors.
  • Specify feature engineering: e.g., distance/travel-time to POIs, spatial lags and neighborhood aggregates, encodings for high-cardinality categoricals.
  • Explain how you will handle high-cardinality categoricals (e.g., neighborhood, school, zip) without leakage.
  1. Training and Validation Strategy
  • Propose a time-aware and spatially blocked cross-validation that avoids leakage and overly optimistic estimates.
  • State and justify evaluation metrics (e.g., RMSLE vs. RMSE vs. MAPE) and calibration checks.
  1. Model Choices and Interpretability
  • Compare GBMs vs. Random Forests vs. linear models with interactions/GAMs, and propose a final approach.
  • Provide an interpretability plan (global and local), including how to communicate drivers of price.
  1. Fairness and Compliance
  • Identify potential proxies for protected classes (e.g., redlining risks) and how you will mitigate and test for them.
  • Outline documentation and reviews needed for compliance.
  1. Deployment, Monitoring, Ablations, and Error Analysis
  • Describe deployment architecture (batch vs. online), feature store, retraining cadence, and CI/CD.
  • List monitoring KPIs (performance, drift, calibration) and segment-based alerts.
  • Explain ablation studies and error analysis you would run to improve the model and build trust.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.