How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Two Sigma.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Two Sigma during technical interviews.

Predict Stock Prices from Google Search Data | Two Sigma Interview Question

Q: Predict Stock Prices from Google Search Data

This is a Machine Learning interview question from Two Sigma for Data Scientist roles. View the full question and solution on PracHub.

Predict Stock Prices from Google Search Data

Company: Two Sigma

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

You are given access to historical Google search data — relative search volume over time for arbitrary query terms — along with standard historical market data (daily prices and trading volume) for a universe of stocks. Design a model that uses the search data to predict stock prices. Walk through your approach end to end: how you frame the prediction target, which search queries you would use and what features you would build from them, what model you would fit, how you would train and validate it without fooling yourself, and how you would decide whether the resulting signal is genuinely useful. Expect the interviewer to keep pressing with "what else?" after each answer — be prepared to go deeper on features, failure modes, and validation at every step. ```hint Pick the right target Raw price levels are non-stationary (close to a random walk), so a model that "predicts the price" mostly learns yesterday's price. Think about predicting **forward returns** (or even volatility/volume) over a chosen horizon instead — and whether you want a time-series forecast for one asset or a **cross-sectional ranking** across many stocks. ``` ```hint What search data actually gives you Public search-volume data is **relative, normalized, sampled, and revised** — the number for a given week can change when it is re-downloaded later. Useful features are usually about *abnormal attention*: current search intensity versus that query's own trailing history (e.g., a z-score or log-change), not the raw level. ``` ```hint The validation trap The fastest way to a fake result here is temporal leakage: shuffled K-fold, features built with future information, or query terms picked because you already know they "worked" historically. Think **walk-forward evaluation**, point-in-time data, and an honest accounting of how many hypotheses you tested. ``` ### Constraints & Assumptions - Search data is available at daily (or weekly) granularity per query term, as a normalized relative-volume index rather than absolute counts, and only becomes available with some delay after the period it covers. - The stock universe is a set of reasonably liquid equities with standard daily open/high/low/close/volume history. - You may assume enough history (several years) to fit and evaluate a model, but the signal-to-noise ratio of any return-prediction problem is very low. - The interviewer cares about modeling judgment and statistical honesty, not about production infrastructure. ### Clarifying Questions to Ask - What prediction horizon do you have in mind — next day, next week, longer? Search data is slow-moving, so the horizon constrains everything downstream. - Are we predicting a single stock's price path, or ranking a cross-section of many stocks (e.g., for a long-short strategy)? - Should the output be a price level, a return, a direction (up/down), or would predicting volatility/volume also count as success? - Exactly when does the search data for day $t$ become available, and can historical values be revised after the fact? - Is success measured by statistical accuracy (e.g., correlation with realized returns) or by economic value (a cost-aware backtest)? ### What a Strong Answer Covers - **Problem framing:** predicting returns (or volatility) rather than price levels, an explicit horizon, and a deliberate choice between time-series and cross-sectional formulations. - **Data realism:** normalization/sampling/revision quirks of public search data, availability lag, point-in-time discipline, and query-selection issues (ambiguous tickers, company vs. product terms). - **Feature construction:** abnormal-attention measures (z-scores, log-changes vs. trailing history), lags, spike indicators, and controls for known drivers like past returns, volume, and volatility. - **Model choice matched to signal-to-noise:** starting simple and regularized before reaching for complex models, with a reasoned justification. - **Leakage-free validation:** walk-forward or purged time-series splits, no shuffled cross-validation, hyperparameters tuned only on past data. - **Honest evaluation and skepticism:** out-of-sample rank correlation / direction accuracy, cost-aware backtest versus simple baselines, and explicit treatment of multiple testing, reverse causality (attention chasing past returns), non-stationarity, and signal decay. ### Follow-up Questions - With thousands of stocks and an unlimited choice of query terms, how do you select queries without turning the whole exercise into data snooping? - Your backtest shows a strong in-sample Sharpe ratio that collapses out of sample. Walk through how you would diagnose what went wrong. - Search attention often *reacts* to price moves rather than leading them. How would you establish that your feature actually leads returns instead of lagging them? - Suppose the firm already runs momentum and reversal signals. How would you test whether your search-based signal adds incremental value rather than repackaging what they already have?

Walk through your approach end to end: how you frame the prediction target, which search queries you would use and what features you would build from them, what model you would fit, how you would train and validate it without fooling yourself, and how you would decide whether the resulting signal is genuinely useful. Expect the interviewer to keep pressing with "what else?" after each answer — be prepared to go deeper on features, failure modes, and validation at every step.

Constraints & Assumptions

Search data is available at daily (or weekly) granularity per query term, as a normalized relative-volume index rather than absolute counts, and only becomes available with some delay after the period it covers.
The stock universe is a set of reasonably liquid equities with standard daily open/high/low/close/volume history.
You may assume enough history (several years) to fit and evaluate a model, but the signal-to-noise ratio of any return-prediction problem is very low.
The interviewer cares about modeling judgment and statistical honesty, not about production infrastructure.

Clarifying Questions to Ask

What prediction horizon do you have in mind — next day, next week, longer? Search data is slow-moving, so the horizon constrains everything downstream.
Are we predicting a single stock's price path, or ranking a cross-section of many stocks (e.g., for a long-short strategy)?
Should the output be a price level, a return, a direction (up/down), or would predicting volatility/volume also count as success?
Exactly when does the search data for day $t$ become available, and can historical values be revised after the fact?
Is success measured by statistical accuracy (e.g., correlation with realized returns) or by economic value (a cost-aware backtest)?

What a Strong Answer Covers

Problem framing: predicting returns (or volatility) rather than price levels, an explicit horizon, and a deliberate choice between time-series and cross-sectional formulations.
Data realism: normalization/sampling/revision quirks of public search data, availability lag, point-in-time discipline, and query-selection issues (ambiguous tickers, company vs. product terms).
Feature construction: abnormal-attention measures (z-scores, log-changes vs. trailing history), lags, spike indicators, and controls for known drivers like past returns, volume, and volatility.
Model choice matched to signal-to-noise: starting simple and regularized before reaching for complex models, with a reasoned justification.
Leakage-free validation: walk-forward or purged time-series splits, no shuffled cross-validation, hyperparameters tuned only on past data.
Honest evaluation and skepticism: out-of-sample rank correlation / direction accuracy, cost-aware backtest versus simple baselines, and explicit treatment of multiple testing, reverse causality (attention chasing past returns), non-stationarity, and signal decay.

Follow-up Questions

With thousands of stocks and an unlimited choice of query terms, how do you select queries without turning the whole exercise into data snooping?
Your backtest shows a strong in-sample Sharpe ratio that collapses out of sample. Walk through how you would diagnose what went wrong.
Search attention often reacts to price moves rather than leading them. How would you establish that your feature actually leads returns instead of lagging them?
Suppose the firm already runs momentum and reversal signals. How would you test whether your search-based signal adds incremental value rather than repackaging what they already have?

Predict Stock Prices from Google Search Data

Company: Two Sigma

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Technical Screen

Constraints & Assumptions

Search data is available at daily (or weekly) granularity per query term, as a normalized relative-volume index rather than absolute counts, and only becomes available with some delay after the period it covers.
The stock universe is a set of reasonably liquid equities with standard daily open/high/low/close/volume history.
You may assume enough history (several years) to fit and evaluate a model, but the signal-to-noise ratio of any return-prediction problem is very low.
The interviewer cares about modeling judgment and statistical honesty, not about production infrastructure.

Clarifying Questions to Ask

What prediction horizon do you have in mind — next day, next week, longer? Search data is slow-moving, so the horizon constrains everything downstream.
Are we predicting a single stock's price path, or ranking a cross-section of many stocks (e.g., for a long-short strategy)?
Should the output be a price level, a return, a direction (up/down), or would predicting volatility/volume also count as success?
Exactly when does the search data for day $t$ become available, and can historical values be revised after the fact?
Is success measured by statistical accuracy (e.g., correlation with realized returns) or by economic value (a cost-aware backtest)?

What a Strong Answer Covers

Problem framing: predicting returns (or volatility) rather than price levels, an explicit horizon, and a deliberate choice between time-series and cross-sectional formulations.
Data realism: normalization/sampling/revision quirks of public search data, availability lag, point-in-time discipline, and query-selection issues (ambiguous tickers, company vs. product terms).
Feature construction: abnormal-attention measures (z-scores, log-changes vs. trailing history), lags, spike indicators, and controls for known drivers like past returns, volume, and volatility.
Model choice matched to signal-to-noise: starting simple and regularized before reaching for complex models, with a reasoned justification.
Leakage-free validation: walk-forward or purged time-series splits, no shuffled cross-validation, hyperparameters tuned only on past data.
Honest evaluation and skepticism: out-of-sample rank correlation / direction accuracy, cost-aware backtest versus simple baselines, and explicit treatment of multiple testing, reverse causality (attention chasing past returns), non-stationarity, and signal decay.

Follow-up Questions

With thousands of stocks and an unlimited choice of query terms, how do you select queries without turning the whole exercise into data snooping?
Your backtest shows a strong in-sample Sharpe ratio that collapses out of sample. Walk through how you would diagnose what went wrong.
Search attention often reacts to price moves rather than leading them. How would you establish that your feature actually leads returns instead of lagging them?
Suppose the firm already runs momentum and reversal signals. How would you test whether your search-based signal adds incremental value rather than repackaging what they already have?

Predict Stock Prices from Google Search Data

Predict Stock Prices from Google Search Data

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Write your answer

Predict Stock Prices from Google Search Data

Predict Stock Prices from Google Search Data

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Write your answer