How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Take-home Project rounds at Citadel.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Citadel during technical interviews.

Design a time-series home-buy decision classifier

Quick Overview

This question evaluates competency in ML system design for time-series decisioning, including rigorous target and horizon definition, temporal feature engineering, time-aware validation, probabilistic calibration and thresholding, leakage controls, model selection trade-offs, and deployment monitoring.

Take‑Home: Classifying Buy‑Now vs Wait Decisions in Housing Time Series

Context

You are given a monthly panel of regional housing and macro time series (e.g., price indices, mortgage rates, inventory, days‑on‑market, unemployment, CPI). The goal is to build a system that, for each region and month t, outputs a calibrated probability and a recommendation: buy now vs wait (i.e., buy within the next k months).

Task

Describe, at design level and with enough specificity to implement:

Target and horizon
- Define the decision horizon k and a rigorous target label y_t for month t.
- Clarify economic assumptions and edge cases (e.g., transaction costs, right‑censoring).
Data preprocessing
- Panel alignment by region and month, handling multiple data vintages if applicable.
- Missing‑value strategy, outliers, scaling, and seasonality/deflation adjustments.
Temporal feature engineering
- Lags, rolling statistics, deltas (m/m, y/y), seasonality dummies, and interaction features.
- Handling non‑stationarity (e.g., differencing, deflation, time‑weighted fitting).
Time‑aware validation
- Train/validation/test splits that respect time.
- Walk‑forward (rolling/expanding window) cross‑validation and hyperparameter tuning.
Models
- Baselines and candidate models (e.g., logistic regression with time features, gradient boosting, sequence models).
- Rationale for choices given data size, interpretability, and regime risk.
Metrics and decisioning
- Probabilistic metrics (AUC, Brier, calibration) and cost‑sensitive objectives reflecting asymmetric risks.
- Derive a thresholding rule tied to user costs/utilities.
Leakage controls
- Methods to prevent look‑ahead bias and data leakage (including macro data release lags and revisions).
Concept drift and monitoring
- How to detect, diagnose, and handle drift post‑deployment; retraining cadence.
User presentation
- How to present a calibrated probability and recommendation to end users, including explanations and scenario analysis.

Quick Overview

Context

Task

Describe, at design level and with enough specificity to implement:

Target and horizon

Define the decision horizon k and a rigorous target label y_t for month t.
Clarify economic assumptions and edge cases (e.g., transaction costs, right‑censoring).

Data preprocessing

Panel alignment by region and month, handling multiple data vintages if applicable.
Missing‑value strategy, outliers, scaling, and seasonality/deflation adjustments.

Temporal feature engineering

Lags, rolling statistics, deltas (m/m, y/y), seasonality dummies, and interaction features.
Handling non‑stationarity (e.g., differencing, deflation, time‑weighted fitting).

Time‑aware validation

Train/validation/test splits that respect time.
Walk‑forward (rolling/expanding window) cross‑validation and hyperparameter tuning.

Models

Baselines and candidate models (e.g., logistic regression with time features, gradient boosting, sequence models).
Rationale for choices given data size, interpretability, and regime risk.

Metrics and decisioning

Probabilistic metrics (AUC, Brier, calibration) and cost‑sensitive objectives reflecting asymmetric risks.
Derive a thresholding rule tied to user costs/utilities.

Leakage controls

Methods to prevent look‑ahead bias and data leakage (including macro data release lags and revisions).

Concept drift and monitoring

How to detect, diagnose, and handle drift post‑deployment; retraining cadence.

User presentation

How to present a calibrated probability and recommendation to end users, including explanations and scenario analysis.

Design a time-series home-buy decision classifier

Quick Overview

Design a time-series home-buy decision classifier

Take‑Home: Classifying Buy‑Now vs Wait Decisions in Housing Time Series

Context

Task

Submit Your Answer to Earn 20XP

Design a time-series home-buy decision classifier

Quick Overview

Design a time-series home-buy decision classifier

Take‑Home: Classifying Buy‑Now vs Wait Decisions in Housing Time Series

Context

Task

Submit Your Answer to Earn 20XP