Minimum Common Pairwise Correlation Among Seven Identically Distributed Random Variables
Company: Jane Street
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Onsite
Suppose $X_1, X_2, \ldots, X_7$ are seven random variables defined on the same probability space. Each has mean $0$ and variance $1$, and they are identically distributed. In addition, every pair has the same correlation coefficient: $\operatorname{Corr}(X_i, X_j) = \rho$ for all $i \neq j$.
What is the minimum possible value of $\rho$? Prove that your bound is tight — that is, show both that no smaller value is achievable and that the minimum value can actually be attained by a valid joint distribution.
```hint Where to start
Look at the sum $S = X_1 + X_2 + \cdots + X_7$. Whatever the joint distribution is, one quantity associated with $S$ can never be negative — expand it in terms of the variances and pairwise covariances.
```
```hint Matrix view
The correlation matrix of the seven variables is the **equicorrelation matrix** $(1-\rho)I + \rho J$, where $J$ is the all-ones matrix. A valid correlation matrix must be positive semidefinite, and this particular matrix has only two distinct eigenvalues — find them as functions of $\rho$.
```
```hint Achievability
For the construction, start from i.i.d. variables $Y_1, \ldots, Y_7$ and consider centering each one by the group mean $\bar{Y}$. What is the correlation between two centered variables?
```
### Constraints & Assumptions
- All seven variables live on a common probability space (correlations between them are well defined).
- $\mathbb{E}[X_i] = 0$ and $\operatorname{Var}(X_i) = 1$ for every $i$, and the $X_i$ are identically distributed.
- All $\binom{7}{2} = 21$ pairwise correlations are equal to the same value $\rho$.
- No independence, Gaussianity, or other distributional assumption is imposed — the answer should hold over all valid joint distributions.
- A complete answer proves the lower bound *and* exhibits (or argues the existence of) a joint distribution that attains it.
### Clarifying Questions to Ask
- Are the variables required to be jointly defined on one probability space, so that pairwise correlations are meaningful? (Yes.)
- Does "identically distributed" refer to the marginal distributions only, or to full exchangeability of the joint distribution? (Equal marginals is the stated requirement; a symmetric construction naturally gives exchangeability.)
- Is any particular family of distributions assumed (e.g., jointly Gaussian), or is the question over all possible joint distributions?
- Do I need to demonstrate that the minimum is attainable, or only derive the lower bound?
### What a Strong Answer Covers
- A clean derivation of the lower bound from a non-negativity argument (variance of the sum, or positive semidefiniteness of the correlation matrix), not just a stated answer.
- The spectral view: identifying the eigenvalues of the equicorrelation matrix and which one binds.
- An explicit construction (or existence argument) showing the bound is attained, including verification of the resulting correlation.
- Generalization of the result to $n$ variables and the intuition for why many variables cannot all be strongly negatively correlated with each other.
### Follow-up Questions
- Generalize: for $n$ identically distributed, unit-variance variables with common pairwise correlation, what is the minimum $\rho$ as a function of $n$, and what happens as $n \to \infty$?
- What is the *maximum* possible common correlation, and why is that side of the constraint easier?
- At the minimum value of $\rho$, what can you say about the random variable $X_1 + \cdots + X_7$? What geometric picture does that correspond to?
- How does this constraint show up in practice — for example, when constructing a portfolio of mutually hedging assets or designing negatively correlated Monte Carlo samples?