PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Citadel

Design a time-series home-buy decision classifier

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in ML system design for time-series decisioning, including rigorous target and horizon definition, temporal feature engineering, time-aware validation, probabilistic calibration and thresholding, leakage controls, model selection trade-offs, and deployment monitoring.

  • hard
  • Citadel
  • ML System Design
  • Data Scientist

Design a time-series home-buy decision classifier

Company: Citadel

Role: Data Scientist

Category: ML System Design

Difficulty: hard

Interview Round: Take-home Project

You are given monthly housing-market time series by region (e.g., price indices, mortgage rates, inventory, days-on-market, macro indicators). Frame this as a classification task to decide whether a buyer should purchase this month versus wait (or purchase within the next k months). Describe: how you would define the target label and decision horizon; data preprocessing and handling missing values; temporal feature engineering (lags, rolling statistics, deltas) and dealing with non-stationarity; time-based training/validation splits and walk-forward cross-validation; models you would try (e.g., logistic regression with time features, gradient boosting, sequence models) and why; evaluation metrics and cost-sensitive objectives reflecting asymmetric risks; methods to avoid look-ahead bias and leakage; approaches to detect and handle concept drift post-deployment; and how you would present a calibrated probability and recommendation to end users.

Quick Answer: This question evaluates competency in ML system design for time-series decisioning, including rigorous target and horizon definition, temporal feature engineering, time-aware validation, probabilistic calibration and thresholding, leakage controls, model selection trade-offs, and deployment monitoring.

Related Interview Questions

  • Stabilize LLM inference and estimate needed repeats - Citadel (medium)
  • Build models for housing and wind power prediction - Citadel (hard)
  • Build a regression model for wind power output - Citadel (hard)
Citadel logo
Citadel
Aug 13, 2025, 12:00 AM
Data Scientist
Take-home Project
ML System Design
5
0

Take‑Home: Classifying Buy‑Now vs Wait Decisions in Housing Time Series

Context

You are given a monthly panel of regional housing and macro time series (e.g., price indices, mortgage rates, inventory, days‑on‑market, unemployment, CPI). The goal is to build a system that, for each region and month t, outputs a calibrated probability and a recommendation: buy now vs wait (i.e., buy within the next k months).

Task

Describe, at design level and with enough specificity to implement:

  1. Target and horizon
    • Define the decision horizon k and a rigorous target label y_t for month t.
    • Clarify economic assumptions and edge cases (e.g., transaction costs, right‑censoring).
  2. Data preprocessing
    • Panel alignment by region and month, handling multiple data vintages if applicable.
    • Missing‑value strategy, outliers, scaling, and seasonality/deflation adjustments.
  3. Temporal feature engineering
    • Lags, rolling statistics, deltas (m/m, y/y), seasonality dummies, and interaction features.
    • Handling non‑stationarity (e.g., differencing, deflation, time‑weighted fitting).
  4. Time‑aware validation
    • Train/validation/test splits that respect time.
    • Walk‑forward (rolling/expanding window) cross‑validation and hyperparameter tuning.
  5. Models
    • Baselines and candidate models (e.g., logistic regression with time features, gradient boosting, sequence models).
    • Rationale for choices given data size, interpretability, and regime risk.
  6. Metrics and decisioning
    • Probabilistic metrics (AUC, Brier, calibration) and cost‑sensitive objectives reflecting asymmetric risks.
    • Derive a thresholding rule tied to user costs/utilities.
  7. Leakage controls
    • Methods to prevent look‑ahead bias and data leakage (including macro data release lags and revisions).
  8. Concept drift and monitoring
    • How to detect, diagnose, and handle drift post‑deployment; retraining cadence.
  9. User presentation
    • How to present a calibrated probability and recommendation to end users, including explanations and scenario analysis.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel ML System Design•Data Scientist ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.