This question evaluates a candidate's ability to design and rigorously evaluate a minute-level mean-reversion trading strategy, encompassing quantitative finance competencies such as signal construction, risk controls, position sizing, transaction-cost modeling, reproducible backtesting, and statistical validation; it belongs to the Analytics & Experimentation category within financial time-series and backtesting. It is commonly asked because it tests both practical implementation skills (reproducible backtests and walk-forward validation) and conceptual understanding (overfitting, data-snooping, and statistical significance testing), requiring a mix of practical application and conceptual statistical reasoning.
Given minute-level OHLCV data for a single equity over 180 trading days, design and backtest a simple mean-reversion strategy: Specify entry/exit rules, risk controls, position sizing, transaction costs, and slippage; implement a reproducible backtest that outputs PnL, win rate, average trade, max drawdown, volatility, Sharpe ratio, and turnover; perform a hold-out or walk-forward validation to assess robustness; discuss overfitting risks and use an appropriate statistical test (e.g., bootstrap or reality-check) to evaluate whether performance is statistically significant.
Quick Answer: This question evaluates a candidate's ability to design and rigorously evaluate a minute-level mean-reversion trading strategy, encompassing quantitative finance competencies such as signal construction, risk controls, position sizing, transaction-cost modeling, reproducible backtesting, and statistical validation; it belongs to the Analytics & Experimentation category within financial time-series and backtesting. It is commonly asked because it tests both practical implementation skills (reproducible backtests and walk-forward validation) and conceptual understanding (overfitting, data-snooping, and statistical significance testing), requiring a mix of practical application and conceptual statistical reasoning.
Minute-Level Mean-Reversion Strategy: Design, Backtest, Validation, and Significance
Context
You are given minute-level OHLCV data (open, high, low, close, volume) for a single equity over 180 regular trading days. Assume the data are split- and dividend-adjusted and cover regular trading hours only (no overnight trading, no extended-hours bars).
Your job is to design a simple intraday mean-reversion strategy and evaluate it rigorously — not just to report a Sharpe ratio, but to convince a skeptical interviewer that any edge you find is real rather than an artifact of data-snooping. The question has four parts:
Specify the complete strategy (signal, entry/exit, risk controls, sizing, costs).
Implement a reproducible backtest that emits a standard set of performance metrics.
Validate out-of-sample with a hold-out or walk-forward scheme.
Quantify overfitting risk and test statistical significance under data-snooping.
Throughout, you must explicitly defend against look-ahead bias and survivorship/data biases, and your fills and costs must be realistic.
Constraints & Assumptions
Bar frequency:
1-minute bars; US-equity regular session
≈390
bars/day, so
≈70,200
bars total over 180 days.
No overnight risk:
the strategy must be
flat at the close
of every day; no position is carried between sessions.
Single name:
one equity, so there is no cross-sectional dimension and survivorship within the panel is not an issue — but you must still account for splits/dividends and avoid implicitly assuming the asset "survived."
Reproducibility:
fixed random seeds for any sampling/bootstrap; same inputs
⇒
same outputs.
Capital / leverage:
assume a notional capital base (e.g., $1M) and a position cap so a single name cannot blow up the book.
State your own numbers
(thresholds, half-lives, cost in bps) explicitly as parameters — the interviewer cares that they are named and justified, not that they match a "right answer."
Clarifying Questions to Ask
A strong candidate scopes the whole problem before coding:
What is the
fill convention
— do signals computed on bar
t
execute at the
next bar's open
(
t+1
), the close, or the mid? (This determines the look-ahead guard.)
What
costs and slippage
should I assume — fixed bps, a spread + impact model, or exchange fee/rebate schedule? Is there a per-share commission?
Is there a
target risk level
(e.g., annualized volatility or daily vol target) the book should run at, or is sizing unconstrained?
Are there
data-quality issues
to expect — missing minutes, zero-volume bars, halts, DST/timezone shifts, opening/closing auctions?
What defines a
"trade"
for metrics like win rate and average trade PnL — a round trip (entry to flat), or each fill?
How many
parameter configurations
am I allowed to search over? (This drives how aggressive the multiple-testing correction must be.)
Part 1 — Strategy Specification
Fully specify a simple mean-reversion strategy. At minimum, define:
Signal:
a mean-reversion indicator (e.g., deviation of price from a short-term moving average, normalized by volatility into a z-score).
Entry/exit rules:
thresholds for entry, and exit conditions (reversion to the mean, time stop, stop-loss, end-of-day flatten).
Risk controls:
per-trade stop-loss, time stop, daily stop / kill-switch, position caps, participation caps, and no-trade windows (e.g., the first/last few minutes).
Position sizing:
an explicit rule (e.g., volatility targeting) with formulas and parameter values.
Transaction-cost & slippage model:
explicit commission and a spread/impact term, ideally a function of participation rate.
What This Part Should Cover
A
complete, unambiguous
rule set — someone else could implement it from the spec alone (every threshold, half-life, and cost has a stated value).
Risk controls that are
layered
(per-trade, per-day, and structural caps) rather than a single stop.
Sizing that
adapts to volatility
and respects a hard position/participation cap.
A cost model
realistic enough to kill spurious microstructure edge
, not a token "0.1%".
Part 2 — Reproducible Backtest
Implement a backtest of the Part 1 strategy. It must, at minimum, output:
Total and per-day PnL
,
win rate
,
average trade PnL
,
max drawdown
,
realized (annualized) volatility
,
Sharpe ratio
, and
turnover
.
The implementation must be reproducible and free of look-ahead: signals use information only up to bar t; execution happens no earlier than t+1.
What This Part Should Cover
A clean
decide-on-t / fill-at-t+1
loop with no leakage, and an explicit end-of-day flatten.
Correct metric definitions
— especially the Sharpe annualization, drawdown from the equity curve, and a consistent "trade" definition for win rate / average trade.
Costs applied at fills
(not just gross PnL), so reported numbers are net.
Reproducibility: seeds fixed, deterministic handling of missing/zero-volume bars.
Part 3 — Out-of-Sample Validation
Validate the strategy out-of-sample. Use either a single hold-out split or, preferably, walk-forward (rolling) validation: tune parameters only on in-sample windows, freeze them, and evaluate on the following untouched test window. Concatenate the test-window results into a single out-of-sample track record and report metrics on it.
What This Part Should Cover
A validation scheme with a
strict train/test boundary
and no parameter leakage across it.
Reporting on the
concatenated OOS track record
, not cherry-picked windows.
An explicit
in-sample-vs-out-of-sample comparison
; large degradation is read as overfitting or regime instability.
Awareness that even walk-forward leaks if the
grid was designed by looking at the full sample
first.
Part 4 — Overfitting Risk & Statistical Significance
Discuss the overfitting / data-snooping risks of this exercise, then apply a suitable statistical test (e.g., a bootstrap-based White's Reality Check, the Hansen SPA test, or a deflated Sharpe ratio) to assess whether the best strategy's performance is statistically significant after accounting for having searched over many parameterizations.
What This Part Should Cover
This part-level rubric covers the inference itself; the cross-cutting bias-control rubric is below.
A correct articulation of
why selecting the best of M configs inflates Type-I error
(data-snooping / multiple testing).
A test whose null distribution is the
max statistic across the candidate set
(Reality Check / SPA / deflated Sharpe), not a per-strategy
t
-test.
A
dependence-preserving bootstrap
(stationary or block), with a justified block length.
A sensible
decision rule
(
p
-value threshold) and what to do if it fails (shrink the model class, gather more data).
What a Strong Answer Covers
Across all four parts, the interviewer is listening for the discipline that separates a real quant from someone who just printed a backtest:
End-to-end coherence:
the costs/sizing in Part 1, the metrics in Part 2, the OOS scheme in Part 3, and the significance test in Part 4 all describe
the same
strategy with consistent assumptions.
Bias hygiene throughout:
no look-ahead (decide on
t
, fill at
t+1
), no survivorship/regime cherry-picking, realistic costs and participation caps, and reproducible seeds — stated as guardrails, not afterthoughts.
Skepticism toward one's own result:
treats a positive in-sample Sharpe as a hypothesis to be falsified (stress costs, delay fills, shrink the grid) rather than a result to be celebrated.
Microstructure awareness:
recognizes that minute-level mean reversion is easily just bid-ask bounce, and that the cost model is what decides whether the edge survives.
Follow-up Questions
Your OOS Sharpe is
half
the in-sample Sharpe. Walk me through how you'd decide whether that's overfitting, a regime change, or just sampling noise — and what each implies for deployment.
Suppose costs were
double
your assumption. How sensitive is the edge, and how would you re-run the significance test to reflect that uncertainty?
How would you extend this from a single equity to a
dollar-neutral, cross-sectional
mean-reversion book, and what new biases (e.g., survivorship in the universe) appear?
The reality-check
p
-value is
0.08
. What concrete actions do you take before either shipping or shelving the strategy?