Engineer and Impute ZIP Features
Company: Intuit
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are building a predictive model for a product team. For some users, you have address fields such as street, city, state, and ZIP code.
1. What features would you derive directly from the address, and what external public datasets could you join on ZIP code or geography to create additional model features?
2. If ZIP code is missing for some users, how would you handle those cases?
In your answer, discuss:
- useful geographic and socioeconomic features
- high-cardinality encoding choices
- when to drop vs. impute missing ZIP codes
- missingness as a potentially informative signal
- fairness, privacy, and leakage risks
- how you would evaluate whether these features improve model performance
Quick Answer: This question evaluates feature engineering, missing-data imputation, high-cardinality encoding, and considerations of fairness, privacy, and data leakage in predictive modeling.