Build House Price Model Responsibly
Company: Capital One
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Onsite
You are asked two machine-learning questions.
**Part A: House-price prediction**
Using a cleaned housing dataset with target `sale_price`, describe an end-to-end approach for building a predictive model.
Your answer should cover:
1. train, validation, and test splitting strategy,
2. target transformation and metric choice such as RMSE vs MAE vs RMSLE,
3. handling categorical features, missing values, and outliers,
4. baseline model vs stronger models,
5. leakage checks,
6. how you would explain your approach if you used an off-the-shelf modeling package during the interview.
**Part B: Face-recognition ethics**
A company wants to deploy face recognition in a high-impact setting. What are the main ethical and ML risks, how would you evaluate subgroup performance and calibration, and what operational safeguards or governance would you require before deployment or before recommending against deployment?
Quick Answer: This question evaluates a data scientist's competencies in end-to-end supervised learning pipeline design—covering train/validation/test strategy, target and metric selection, handling of categorical features, missing values and outliers, model benchmarking and leakage detection—alongside responsible AI considerations such as subgroup performance evaluation, calibration, ethical risks, and deployment governance. It is commonly asked in Machine Learning interviews to probe both conceptual understanding and practical application, testing technical modeling skills together with ethical and operational judgment, and thus sits in the Machine Learning domain with a level of abstraction spanning conceptual and practical.