PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Two Sigma

Design features for house price prediction

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competence in feature engineering, data preprocessing, baseline regression modeling, and model evaluation for tabular price prediction problems.

  • nan
  • Two Sigma
  • Machine Learning
  • Data Scientist

Design features for house price prediction

Company: Two Sigma

Role: Data Scientist

Category: Machine Learning

Difficulty: nan

Interview Round: Technical Screen

## Scenario You are building a model to predict **house sale price** from a tabular dataset (similar to typical real-estate datasets). The interviewer expects a simple baseline model (e.g., linear regression), but wants to understand your reasoning. ## Questions 1. **Which features are likely to be predictive** of house price, and why? (Examples: location, size, age, condition, amenities, nearby schools, etc.) 2. **How do you decide which features are usable** (available at prediction time, not leaking label information, stable definitions)? 3. What **data cleaning** steps would you perform before modeling? 4. If starting with **linear regression**, how would you: - handle missing values, - handle categorical variables, - reduce the impact of outliers/skewed price distributions, - detect multicollinearity and mitigate it? 5. How would you evaluate the model and iterate on improvements? Assume you have a training set with historical sales and a holdout set for evaluation.

Quick Answer: This question evaluates competence in feature engineering, data preprocessing, baseline regression modeling, and model evaluation for tabular price prediction problems.

Related Interview Questions

  • Analyze Temperatures and Update Regression - Two Sigma (medium)
  • How would you forecast bike demand? - Two Sigma (hard)
  • Predict Bike Dock Demand - Two Sigma (hard)
  • Predict bike demand and avoid overfitting - Two Sigma (hard)
  • How detect duplicate card records? - Two Sigma (medium)
Two Sigma logo
Two Sigma
Jan 22, 2026, 12:00 AM
Data Scientist
Technical Screen
Machine Learning
9
0
Loading...

Scenario

You are building a model to predict house sale price from a tabular dataset (similar to typical real-estate datasets). The interviewer expects a simple baseline model (e.g., linear regression), but wants to understand your reasoning.

Questions

  1. Which features are likely to be predictive of house price, and why? (Examples: location, size, age, condition, amenities, nearby schools, etc.)
  2. How do you decide which features are usable (available at prediction time, not leaking label information, stable definitions)?
  3. What data cleaning steps would you perform before modeling?
  4. If starting with linear regression , how would you:
    • handle missing values,
    • handle categorical variables,
    • reduce the impact of outliers/skewed price distributions,
    • detect multicollinearity and mitigate it?
  5. How would you evaluate the model and iterate on improvements?

Assume you have a training set with historical sales and a holdout set for evaluation.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Two Sigma•More Data Scientist•Two Sigma Data Scientist•Two Sigma Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.