PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Two Sigma

Design a house-price prediction model

Last updated: Mar 29, 2026

Quick Overview

This question evaluates machine learning and data science competencies including regression model design, feature engineering, data-splitting and leakage prevention, selection of evaluation metrics, handling missing values, outliers and high-cardinality location features, temporal drift management, and model interpretation for house-price prediction. It is commonly asked in the Machine Learning domain to assess end-to-end practical application and conceptual understanding of validation and metric trade-offs, primarily testing practical application supported by conceptual reasoning and stakeholder communication.

  • easy
  • Two Sigma
  • Machine Learning
  • Data Scientist

Design a house-price prediction model

Company: Two Sigma

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Technical Screen

## Problem You are asked to build a model to **predict house sale prices** for a city of your choice. ### Data (assume typical real-estate fields) You have a historical dataset of home listings/sales with (examples): - `sale_id` (string, unique) - `city` (string) - `sale_date` (date) - `sale_price` (float, target) - `bedrooms` (int), `bathrooms` (float), `sqft` (float), `lot_sqft` (float) - `year_built` (int) - `zipcode`/`neighborhood` (string) - `lat`/`lon` (float) - `property_type` (categorical) - `days_on_market` (int) - Optional external features (if you choose): school ratings, crime, interest rates, nearby transit, etc. ### Tasks 1. Propose an end-to-end approach (data cleaning → feature engineering → model selection). 2. Define how you would split data (train/validation/test) and **avoid leakage**. 3. Choose evaluation metrics and justify them (e.g., MAE vs RMSE vs MAPE). 4. Explain how you would handle: - missing values and outliers - high-cardinality location features (zip/neighborhood) - temporal drift (market changes) 5. Describe how you would interpret the model and communicate results to stakeholders.

Quick Answer: This question evaluates machine learning and data science competencies including regression model design, feature engineering, data-splitting and leakage prevention, selection of evaluation metrics, handling missing values, outliers and high-cardinality location features, temporal drift management, and model interpretation for house-price prediction. It is commonly asked in the Machine Learning domain to assess end-to-end practical application and conceptual understanding of validation and metric trade-offs, primarily testing practical application supported by conceptual reasoning and stakeholder communication.

Related Interview Questions

  • Analyze Temperatures and Update Regression - Two Sigma (medium)
  • How would you forecast bike demand? - Two Sigma (hard)
  • Predict Bike Dock Demand - Two Sigma (hard)
  • Predict bike demand and avoid overfitting - Two Sigma (hard)
  • How detect duplicate card records? - Two Sigma (medium)
|Home/Machine Learning/Two Sigma

Design a house-price prediction model

Two Sigma logo
Two Sigma
Dec 15, 2025, 12:00 AM
easyData ScientistTechnical ScreenMachine Learning
6
0
Loading...

Problem

You are asked to build a model to predict house sale prices for a city of your choice.

Data (assume typical real-estate fields)

You have a historical dataset of home listings/sales with (examples):

  • sale_id (string, unique)
  • city (string)
  • sale_date (date)
  • sale_price (float, target)
  • bedrooms (int), bathrooms (float), sqft (float), lot_sqft (float)
  • year_built (int)
  • zipcode / neighborhood (string)
  • lat / lon (float)
  • property_type (categorical)
  • days_on_market (int)
  • Optional external features (if you choose): school ratings, crime, interest rates, nearby transit, etc.

Tasks

  1. Propose an end-to-end approach (data cleaning → feature engineering → model selection).
  2. Define how you would split data (train/validation/test) and avoid leakage .
  3. Choose evaluation metrics and justify them (e.g., MAE vs RMSE vs MAPE).
  4. Explain how you would handle:
    • missing values and outliers
    • high-cardinality location features (zip/neighborhood)
    • temporal drift (market changes)
  5. Describe how you would interpret the model and communicate results to stakeholders.
Loading comments...

Browse More Questions

More Machine Learning•More Two Sigma•More Data Scientist•Two Sigma Data Scientist•Two Sigma Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.