Predict Future Car Sale Prices
Company: Circle
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are given transaction-level data from car dealers for calendar year 2018, and you need to predict vehicle sale prices for transactions that will happen after 2018.
Assume each row is a completed vehicle sale with fields such as:
- sale_date
- dealer_id
- make, model, trim, model_year
- mileage
- condition_score
- fuel_type, transmission, body_type, color
- accident_history, number_of_owners
- dealer location
- final_sale_price
Answer the following:
1. How would you frame the prediction problem, define the target, and split the data so evaluation reflects future deployment rather than random historical fit?
2. What feature engineering would you do for a vehicle price prediction model?
3. Which features would you exclude because they create target leakage or would not be available at prediction time?
4. Which models and evaluation metrics would you compare, and how would you handle nonlinearity, seasonality, and distribution shift between 2018 and later periods?
5. Suppose you are shown the output of a fitted linear regression model, including coefficients, standard errors, p-values, and R-squared. How would you interpret the coefficients for continuous and categorical variables, and what caveats would you mention before using those results for business decisions?
6. If the dataset only contains completed sales and not unsold inventory, what selection-bias issues might arise?
Quick Answer: This question evaluates machine learning and data science competencies including temporal framing and validation of predictive tasks, target specification, feature engineering and leakage identification, model selection and evaluation metrics, interpretation of regression outputs, and awareness of selection bias.