Design a time-series home-buy decision classifier

Q: Design a time-series home-buy decision classifier

This is a ML System Design interview question from Citadel for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Take‑Home: Classifying Buy‑Now vs Wait Decisions in Housing Time Series

Context

You are given a monthly panel of regional housing and macro time series (e.g., price indices, mortgage rates, inventory, days‑on‑market, unemployment, CPI). The goal is to build a system that, for each region and month t, outputs a calibrated probability and a recommendation: buy now vs wait (i.e., buy within the next k months).

Task

Describe, at design level and with enough specificity to implement:

Target and horizon
- Define the decision horizon k and a rigorous target label y_t for month t.
- Clarify economic assumptions and edge cases (e.g., transaction costs, right‑censoring).
Data preprocessing
- Panel alignment by region and month, handling multiple data vintages if applicable.
- Missing‑value strategy, outliers, scaling, and seasonality/deflation adjustments.
Temporal feature engineering
- Lags, rolling statistics, deltas (m/m, y/y), seasonality dummies, and interaction features.
- Handling non‑stationarity (e.g., differencing, deflation, time‑weighted fitting).
Time‑aware validation
- Train/validation/test splits that respect time.
- Walk‑forward (rolling/expanding window) cross‑validation and hyperparameter tuning.
Models
- Baselines and candidate models (e.g., logistic regression with time features, gradient boosting, sequence models).
- Rationale for choices given data size, interpretability, and regime risk.
Metrics and decisioning
- Probabilistic metrics (AUC, Brier, calibration) and cost‑sensitive objectives reflecting asymmetric risks.
- Derive a thresholding rule tied to user costs/utilities.
Leakage controls
- Methods to prevent look‑ahead bias and data leakage (including macro data release lags and revisions).
Concept drift and monitoring
- How to detect, diagnose, and handle drift post‑deployment; retraining cadence.
User presentation
- How to present a calibrated probability and recommendation to end users, including explanations and scenario analysis.

Design a time-series home-buy decision classifier

Take‑Home: Classifying Buy‑Now vs Wait Decisions in Housing Time Series

Context

Task

Solution (Locked)

Comments (0)