PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Two Sigma

Design a house-price prediction model

Last updated: Mar 29, 2026

Quick Overview

This question evaluates machine learning and data science competencies including regression model design, feature engineering, data-splitting and leakage prevention, selection of evaluation metrics, handling missing values, outliers and high-cardinality location features, temporal drift management, and model interpretation for house-price prediction. It is commonly asked in the Machine Learning domain to assess end-to-end practical application and conceptual understanding of validation and metric trade-offs, primarily testing practical application supported by conceptual reasoning and stakeholder communication.

  • easy
  • Two Sigma
  • Machine Learning
  • Data Scientist

Design a house-price prediction model

Company: Two Sigma

Role: Data Scientist

Category: Machine Learning

Difficulty: easy

Interview Round: Technical Screen

## Problem You are asked to build a model to **predict house sale prices** for a city of your choice. ### Data (assume typical real-estate fields) You have a historical dataset of home listings/sales with (examples): - `sale_id` (string, unique) - `city` (string) - `sale_date` (date) - `sale_price` (float, target) - `bedrooms` (int), `bathrooms` (float), `sqft` (float), `lot_sqft` (float) - `year_built` (int) - `zipcode`/`neighborhood` (string) - `lat`/`lon` (float) - `property_type` (categorical) - `days_on_market` (int) - Optional external features (if you choose): school ratings, crime, interest rates, nearby transit, etc. ### Tasks 1. Propose an end-to-end approach (data cleaning → feature engineering → model selection). 2. Define how you would split data (train/validation/test) and **avoid leakage**. 3. Choose evaluation metrics and justify them (e.g., MAE vs RMSE vs MAPE). 4. Explain how you would handle: - missing values and outliers - high-cardinality location features (zip/neighborhood) - temporal drift (market changes) 5. Describe how you would interpret the model and communicate results to stakeholders.

Quick Answer: This question evaluates machine learning and data science competencies including regression model design, feature engineering, data-splitting and leakage prevention, selection of evaluation metrics, handling missing values, outliers and high-cardinality location features, temporal drift management, and model interpretation for house-price prediction. It is commonly asked in the Machine Learning domain to assess end-to-end practical application and conceptual understanding of validation and metric trade-offs, primarily testing practical application supported by conceptual reasoning and stakeholder communication.

Related Interview Questions

  • Analyze Temperatures and Update Regression - Two Sigma (medium)
  • How would you forecast bike demand? - Two Sigma (hard)
  • Predict Bike Dock Demand - Two Sigma (hard)
  • Predict bike demand and avoid overfitting - Two Sigma (hard)
  • How detect duplicate card records? - Two Sigma (medium)
Two Sigma logo
Two Sigma
Dec 15, 2025, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
4
0
Loading...

Problem

You are asked to build a model to predict house sale prices for a city of your choice.

Data (assume typical real-estate fields)

You have a historical dataset of home listings/sales with (examples):

  • sale_id (string, unique)
  • city (string)
  • sale_date (date)
  • sale_price (float, target)
  • bedrooms (int), bathrooms (float), sqft (float), lot_sqft (float)
  • year_built (int)
  • zipcode / neighborhood (string)
  • lat / lon (float)
  • property_type (categorical)
  • days_on_market (int)
  • Optional external features (if you choose): school ratings, crime, interest rates, nearby transit, etc.

Tasks

  1. Propose an end-to-end approach (data cleaning → feature engineering → model selection).
  2. Define how you would split data (train/validation/test) and avoid leakage .
  3. Choose evaluation metrics and justify them (e.g., MAE vs RMSE vs MAPE).
  4. Explain how you would handle:
    • missing values and outliers
    • high-cardinality location features (zip/neighborhood)
    • temporal drift (market changes)
  5. Describe how you would interpret the model and communicate results to stakeholders.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Two Sigma•More Data Scientist•Two Sigma Data Scientist•Two Sigma Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.