Analyze expectations, correlations, and investment strategies
Company: Squarepoint
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Consider the following independent quantitative questions.
---
### 1. Stopping game with three outcomes
You play a game consisting of independent rounds. At the start your cumulative reward is 0. In each round, one of three things happens:
- With probability `p1`, you **gain a fixed amount `a` dollars**, and the game continues to the next round.
- With probability `p2`, the game **stops immediately** and you **keep your accumulated reward**.
- With the remaining probability `1 - p1 - p2`, the game **ends immediately and your reward is reset to 0** (you lose everything earned so far).
Assume `0 < p1 + p2 < 1`, `p1 >= 0`, `p2 >= 0`, and `a > 0`.
**Question:** Derive a closed-form expression for the **expected payoff** of this game at the time it eventually ends, in terms of `p1`, `p2`, and `a`.
---
### 2. Expected length of initially increasing run in a permutation
Let `x1, x2, ..., xn` be a **random permutation** of `n` distinct numbers, where all `n!` permutations are equally likely.
Define a random variable `Y` as the length of the **longest strictly increasing prefix** of the sequence, using the following rule:
- `Y = k` if
- `x1 < x2 < ... < xk`, and
- either `k = n` (the entire sequence is strictly increasing) **or** `xk >= x(k+1)` (the first position where the prefix stops being strictly increasing is between positions `k` and `k+1`).
**Question:** Find `E[Y]` as a function of `n`.
---
### 3. Why can a constant-dollar strategy make money?
A **constant-dollar strategy** in a market with one risky asset and cash is defined as:
- You target a fixed dollar amount `W` invested in the risky asset.
- After each price move, you **rebalance**: if the risky asset has gone up, you sell some shares to bring its dollar value back down to `W`; if it has gone down, you buy more shares to bring it back up to `W`. The remainder stays in cash.
Assume a volatile market where the risky asset price fluctuates up and down over time, but has approximately **zero long-term drift** in expectation.
**Question:** Explain, both intuitively and with a simple numerical or probabilistic argument, **why and under what conditions** such a constant-dollar rebalancing strategy can generate a positive expected profit ("volatility harvesting") even when the asset's long-run expected price change is roughly zero.
---
### 4. Correlation constraints and correlation matrix properties
Let `a`, `b`, and `c` be random variables with correlations
- `corr(a, b) = rho_ab`,
- `corr(b, c) = rho_bc`,
where `-1 <= rho_ab <= 1` and `-1 <= rho_bc <= 1`.
1. **Range of corr(a, c):**
- Let `rho_ac = corr(a, c)`. Use the fact that any **correlation matrix must be positive semidefinite (PSD)** to derive the **feasible range** of `rho_ac` in terms of `rho_ab` and `rho_bc`.
2. **Correlation matrix properties:**
- List the key mathematical properties that any valid correlation matrix must satisfy (e.g., symmetry, diagonal elements, PSD, etc.).
---
### 5. Multi-year mean and variance of returns
Consider an investment strategy whose **one-year simple return** (not log return) is a random variable `R` with:
- Mean `E[R] = mu`.
- Standard deviation `std(R) = sigma` (so variance `Var(R) = sigma^2`).
Assume annual returns for different years are **independent and identically distributed (i.i.d.)** copies of `R`:
- `R1, R2, ..., RT` for `T` years.
Define:
- The **T-year cumulative simple return** as `R_total = R1 + R2 + ... + RT` (ignoring compounding, treating each year's return as additive P&L for simplicity).
- The **average annual return** over `T` years as `R_avg = R_total / T`.
**Questions:**
1. Find `E[R_total]` and `Var(R_total)` in terms of `mu`, `sigma^2`, and `T`.
2. Find `E[R_avg]` and `Var(R_avg)` in terms of `mu`, `sigma^2`, and `T`.
3. Briefly comment on how the variance of the average return behaves as `T` increases, assuming the i.i.d. model holds.
Quick Answer: This multipart Machine Learning question evaluates probabilistic expectation and stopping-time reasoning, combinatorial expectations in random permutations, stochastic portfolio rebalancing and volatility-harvesting intuition, and linear-algebraic properties of correlation matrices (including positive semidefiniteness).