Compute probabilities and expectations in random processes
Company: Optiver
Role: Data Scientist
Category: Statistics & Math
Difficulty: easy
Interview Round: Take-home Project
This is a rapid-fire **probability and expectation** round: a sequence of short, self-contained brain-teasers solved on a whiteboard. The interviewer expects a clean setup, the right technique, and an exact closed-form answer (or a tight numeric value when no clean closed form exists), each in a few minutes. Unless stated otherwise, all random choices are **uniform** and **independent**.
### Constraints & Assumptions
- A "fair die" is a six-sided die uniform on $\{1,2,3,4,5,6\}$; a "fair coin" is uniform on $\{H,T\}$.
- "Standard 52-card deck": 13 ranks, 4 suits each; draws are without replacement unless a part says "with replacement".
- Give **exact** values (fractions or closed forms) when they exist; otherwise reduce the problem to a finite linear system / DP and report the resulting numeric value.
- Each part is independent of the others.
### Clarifying Questions to Ask
- For "expected sum / product" until a stopping rule: are the stopping roll(s) themselves included in the sum/product? (Assume **yes** — every roll made counts.)
- For card draws: does rank-product treat A=1 and face cards J/Q/K = 11/12/13 (not 10)? (Yes, as specified per part.)
- For the grid and graph walks: what exactly is the absorbing set, and where precisely is the start? (See each part; for the $10\times10$ grid, "boundary" = the outer ring of cells and "center" = an inner cell at distance-from-edge maximal.)
- For "with replacement" sampling: are repeats within one person's draw allowed and counted as drawn? (Yes.)
- When a closed form is ugly, is a reduced linear-system/DP formulation plus a numeric answer acceptable? (Yes — show the setup, then the number.)
### Part 1
**Distinct digits.** Choose an integer uniformly at random from **1 to 10,000** inclusive. What is the probability that **all digits of its decimal representation are distinct** (no repeated digit)?
```hint Count by digit length
Bucket by number of digits (1-, 2-, 3-, 4-digit; plus the single value 10,000). Within each bucket, count distinct-digit numbers with a leading-digit-can't-be-zero argument, e.g. 3-digit: $9\cdot 9\cdot 8$.
```
#### What This Part Should Cover
- Correct case split by digit length with no-leading-zero handling.
- Remembering the boundary value 10,000 and whether it qualifies.
- Reducing favorable count over the exact denominator 10,000.
### Part 2
**Stop when > 4.** Roll a fair die repeatedly until you roll a value **greater than 4** (5 or 6). What is the **expected sum** of all rolls?
```hint One-step / conditioning
Condition on the first roll: with prob $1/3$ you stop (roll $\in\{5,6\}$); with prob $2/3$ you continue and restart, having added a value uniform on $\{1,2,3,4\}$. Set up $E = \tfrac13\,\mathbb E[\text{roll}\mid\text{stop}] + \tfrac23(\mathbb E[\text{roll}\mid\text{continue}] + E)$.
```
#### What This Part Should Cover
- Renewal / first-step decomposition into "stop" vs "continue" branches.
- Correct conditional means $\mathbb E[\text{roll}\mid 5,6]$ and $\mathbb E[\text{roll}\mid 1\text{–}4]$.
- Solving the self-referential equation for $E$.
### Part 3
**Three-card rank product.** From a standard 52-card deck, draw 3 cards uniformly without replacement. Map ranks $A=1$, $2$–$10$ as themselves, $J=11$, $Q=12$, $K=13$. What is the **expected value of the product** of the three ranks?
```hint Linearity won't do — use power sums
The product over a without-replacement triple expands via symmetric-function identities. With $S_p=\sum_{\text{cards}} r^p$ (note each rank value appears 4 times), the sum of products over ordered distinct triples is $S_1^3 - 3S_1S_2 + 2S_3$; divide by the ordered count $52\cdot51\cdot50$.
```
#### What This Part Should Cover
- Recognizing dependence (without replacement) and using power sums $S_1,S_2,S_3$.
- Correct Newton-style identity for $\sum_{i\ne j\ne k} r_i r_j r_k$.
- Consistent ordered (or unordered) counting in numerator and denominator.
### Part 4
**Cancel H/T pairs.** Toss **100 fair coins**. Repeatedly remove one Head together with one Tail until you can't; the number left equals the number of unmatched flips. What is the **expected remaining number of coins**?
```hint It's an absolute difference
After cancellation the remainder is $|H-T|$ where $H\sim\text{Bin}(100,\tfrac12)$. Equivalently it's $|S_{100}|$ for a $\pm1$ simple random walk; use the known mean-absolute-displacement identity $\mathbb E|S_{2m}| = 2m\binom{2m}{m}/2^{2m}$.
```
#### What This Part Should Cover
- Reducing "cancel pairs" to $|H-T|$.
- Knowing/deriving $\mathbb E|S_{2m}|$ for the symmetric walk.
- Producing the exact form $100\binom{100}{50}/2^{100}$ and its numeric value.
### Part 5
**Stop on two identical in a row (sum).** Roll a fair die until you first see **two consecutive equal rolls**. What is the **expected sum** of all rolls?
```hint Decouple count from per-roll mean
After the first roll, each subsequent roll matches its predecessor with prob $1/6$, so the number of rolls is $1+\text{Geom}(1/6)$. Argue that conditioning on the matching event does not bias the per-roll mean away from $3.5$ (each face is equally likely to be the repeated one), so $\mathbb E[\text{sum}]=\mathbb E[\#\text{rolls}]\cdot 3.5$ — or verify with a 6-state linear system on the previous face.
```
#### What This Part Should Cover
- Computing $\mathbb E[\#\text{rolls}]$ from the geometric stopping rule.
- Justifying that the stopping rule leaves the per-roll mean at $3.5$ (symmetry across faces).
- Combining the two for the expected sum.
### Part 6
**Run of 3 in 8 flips.** Toss a fair coin **8 times** (order matters). What is the probability the sequence contains **at least one run of 3 consecutive identical outcomes** (HHH or TTT anywhere)?
```hint Count the complement with a DP
Count sequences with **no** run of length $\ge 3$ via a small DP over states (current symbol, current run length $\in\{1,2\}$); subtract from $2^8=256$.
```
#### What This Part Should Cover
- Switching to the complement (no run of 3).
- A correct run-length DP / recurrence (Tribonacci-style transitions).
- Final probability as $1-(\text{valid})/256$ reduced.
### Part 7
**1D walk with die-based steps.** Start at position 0. Each step, roll a die: rolls $1,2,3$ move **right** by that amount; rolls $4,5,6$ move **left** by $(\text{roll}-3)$ (i.e. $-1,-2,-3$). What is the **expected number of steps** until $|\text{position}|\ge 10$?
```hint First-step + symmetry
The step distribution is symmetric ($\pm1,\pm2,\pm3$ each w.p. $1/6$, mean 0). Define $E(x)$ = expected steps to absorption; $E(x)=0$ for $|x|\ge10$, else $E(x)=1+\tfrac16\sum_{s} E(x+s)$. By $E(x)=E(-x)$ this is 10 unknowns; solve the linear system.
```
```hint Wald-style sanity check
$\mathbb E[\Delta^2]=\tfrac{1}{6}(1+4+9+1+4+9)=\tfrac{14}{3}$. Since $X_n^2-\tfrac{14}{3}n$ is a martingale, $\mathbb E[\tau]=\mathbb E[X_\tau^2]/(14/3)$, giving a bracket you can use to check the linear-system answer.
```
#### What This Part Should Cover
- Setting up the absorbing first-step recurrence with overshoot handled (terms with $|\cdot|\ge10$ are 0).
- Exploiting the $x\leftrightarrow -x$ symmetry to shrink the system.
- Optional martingale bound as a cross-check.
### Part 8
**Two 3-digit numbers, two-digit difference.** Pick two integers independently and uniformly from **100 to 999**. What is the probability their **absolute difference** is a **two-digit number** (between 10 and 99 inclusive)?
```hint Count pairs by gap size
With $N=900$ values, the number of ordered pairs at gap exactly $d\ge1$ is $2(N-d)$. Sum over $d=10,\dots,99$ and divide by $N^2$.
```
#### What This Part Should Cover
- Counting ordered pairs at a fixed absolute gap $d$.
- Summing the arithmetic series over $d\in[10,99]$.
- Reducing over $N^2=810{,}000$.
### Part 9
**Two particles on an octagon.** On a cycle of 8 vertices, place two particles at opposite vertices (distance 4 apart). Each second, each particle independently steps clockwise or counterclockwise by one (fair coin). What is the **expected number of seconds** until they occupy the **same vertex**?
```hint Track the gap, not both particles
The (signed) gap changes by the difference of two $\pm1$ steps: $0$ w.p. $1/2$, $+2$ w.p. $1/4$, $-2$ w.p. $1/4$. Parity of the gap is preserved, so the even-distance states $\{0,2,4,6\}$ form the chain; meeting is gap $\equiv 0 \pmod 8$. Set $E_0=0$, use $E_2=E_6$ by symmetry, and solve.
```
#### What This Part Should Cover
- Reducing two walkers to one gap chain on $\{0,2,4,6\}$.
- Correct transition probabilities $\tfrac12/\tfrac14/\tfrac14$ for the gap.
- Using symmetry $E_2=E_6$ and solving for $E_4$.
### Part 10
**Per-face running sums to 100.** Roll a fair die repeatedly. For each face $i\in\{1,\dots,6\}$ track $S_i = i\times(\text{count of }i)$. Stop as soon as **any** $S_i\ge 100$. At the stopping time $\tau$, what is the **expected number of even rolls** (2, 4, 6) observed?
```hint Even-count is half the roll-count
Whether a roll is even (prob $\tfrac12$) is independent of its value, and the stopping rule depends only on values. $M_n=(\#\text{evens in }n)-\tfrac{n}{2}$ is a martingale, so by optional stopping $\mathbb E[\#\text{evens}]=\tfrac12\mathbb E[\tau]$. The work is computing $\mathbb E[\tau]$.
```
```hint Get E[τ] by a tail sum
$\tau$ stops when count $i$ first hits $m_i=\lceil 100/i\rceil$ (so $m=100,50,34,25,20,17$ for $i=1..6$). Use $\mathbb E[\tau]=\sum_{n\ge0}\mathbb P(\tau>n)$, where $\mathbb P(\tau>n)=\dfrac{n!}{6^n}[x^n]\prod_{i=1}^{6}\Big(\sum_{k=0}^{m_i-1}\tfrac{x^k}{k!}\Big)$ — i.e. all six counts stay below threshold.
```
#### What This Part Should Cover
- The independence/martingale argument linking even-count to $\tfrac12\mathbb E[\tau]$.
- Translating the $S_i\ge100$ rule into count thresholds $m_i=\lceil100/i\rceil$.
- A correct multinomial tail-sum (or absorbing-DP) for $\mathbb E[\tau]$ and a numeric value.
### Part 11
**Thirteen cards, no aces.** Draw 13 cards uniformly without replacement from a standard deck. What is the probability the hand contains **no aces**?
```hint Hypergeometric
Choose all 13 from the 48 non-aces: $\binom{48}{13}/\binom{52}{13}$.
```
#### What This Part Should Cover
- Recognizing the hypergeometric "all from the safe pile" count.
- Correct ratio of binomials.
### Part 12
**2D walk to the boundary of a $10\times10$ grid.** A particle starts at the center of a $10\times10$ grid (interior cells; the outer ring is absorbing boundary). Each second, flip two fair coins: HH = North, TT = South, HT = West, TH = East (one step each, the four directions equally likely). What is the **expected time** to hit the boundary?
```hint Discrete Poisson / Dirichlet problem
Let $E(x,y)$ be expected time; $E=0$ on the boundary ring, and for interior cells $E(x,y)=1+\tfrac14\sum_{\text{4 neighbors}}E$. Solve the linear system (use the grid's two-axis symmetry to cut unknowns), then read off the center value.
```
#### What This Part Should Cover
- Recognizing the four directions are equiprobable ($\tfrac14$ each) and setting up the harmonic recurrence with $E=0$ on the boundary.
- Exploiting reflective symmetry to reduce the system.
- A numeric expected hitting time from the center.
### Part 13
**$\mathbb E[2^{X_1+X_2+X_3}]$.** Roll 3 fair dice. Compute $\mathbb E\!\left[2^{X_1+X_2+X_3}\right]$.
```hint Factor over independence
$2^{X_1+X_2+X_3}=\prod_k 2^{X_k}$, and independence gives $\big(\mathbb E[2^{X}]\big)^3$ with $\mathbb E[2^X]=\tfrac16(2+4+8+16+32+64)$.
```
#### What This Part Should Cover
- Factoring the MGF-like quantity over independent dice.
- Computing $\mathbb E[2^X]$ and cubing.
### Part 14
**Stop on two identical in a row (product).** Roll a fair die until you first see two consecutive equal results. What is the **expected product** of all rolls?
```hint Compare growth vs. stopping decay
Set up the conditional-expectation system $M_f=\tfrac16 f+\sum_{g\ne f}\tfrac16\,g\,M_g$ (expected product to come given last face $f$). Examine whether it has a finite nonnegative solution — i.e. check the spectral radius of the multiplicative operator $B_{f,g}=\tfrac{g}{6}\,[g\ne f]$ against 1.
```
#### What This Part Should Cover
- Recognizing this is *not* "expected sum" — the product can grow geometrically while stopping decays only geometrically.
- Formalizing via the linear system / operator spectral radius (which exceeds 1).
- Concluding the expectation **diverges**.
### Part 15
**Bankruptcy in 10 rolls.** Start with \$10. Roll a fair die exactly 10 times. If the roll is even, **add** its face value to your money; if odd, **subtract** it. What is the probability you go **bankrupt** (money $\le 0$) at **any** step during the 10 rolls?
```hint Absorbing first-passage DP
Increments are $+2,+4,+6,-1,-3,-5$ each w.p. $1/6$. Let $p(t,w)$ = probability of *ever* hitting $\le0$ within $t$ remaining rolls from wealth $w$; with $p(t,w)=1$ for $w\le0$, $p(0,w)=0$ for $w>0$, and $p(t,w)=\tfrac16\sum_\Delta p(t-1,w+\Delta)$. Answer is $p(10,10)$.
```
#### What This Part Should Cover
- Listing the six signed increments correctly.
- A first-passage (ruin-at-any-step, not just terminal) DP over (rolls remaining, wealth).
- Numeric ruin probability $p(10,10)$.
### Part 16
**No overlap in two length-3 samples from $\{1,\dots,10\}$.** You and a friend each independently draw 3 numbers from $\{1,\dots,10\}$ **with replacement**. What is the probability that **none** of your numbers appears among your friend's three draws?
```hint Condition on how many distinct values you drew
Let $K$ = number of distinct values in your 3 draws ($K\in\{1,2,3\}$). Compute $\mathbb P(K=k)$, then the friend avoids all $k$ values with probability $((10-k)/10)^3$; sum over $k$.
```
#### What This Part Should Cover
- Getting the distribution of distinct-value count $K$ (surjection / Stirling-style counts).
- The conditional avoidance probability $((10-k)/10)^3$.
- Combining into the total probability.
### Part 17
**Last roll is 2 when the sum first exceeds 100.** Roll a fair die repeatedly until the running total first **exceeds 100**. What is the probability the **last roll** is exactly 2?
```hint Recurse on the deficit (renewal)
Let $h(\text{rem})$ = probability the overshooting roll equals 2 when you still need to add more than $\text{rem}{-}1$ (deficit $=$ target $-$ current). From deficit $r$: a roll $k\ge r$ ends (counts if $k=2$), else recurse to $r-k$. Start at $r=101$. Recognize this as a renewal/overshoot recursion and ask what the long-run frequency of each face as the overshooting roll should be.
```
#### What This Part Should Cover
- Reframing "exceeds 100" as a deficit/overshoot recursion.
- Correctly handling which rolls *end* the process vs. continue.
- Computing the limiting overshoot probability for the face value 2 and verifying convergence at the given target.
### What a Strong Answer Covers
Across all parts, a strong candidate demonstrates the same handful of probabilistic tools applied cleanly and chooses the lightest one per problem:
- **First-step / renewal conditioning** for stopping-rule expectations (Parts 2, 5, 14, 17).
- **Absorbing-state linear systems / DP** for hitting times and ruin (Parts 7, 9, 12, 15), including symmetry reductions.
- **Counting & symmetric functions** for static probabilities (Parts 1, 3, 6, 8, 11, 16).
- **Martingale / linearity shortcuts** to avoid heavy computation (Parts 4, 10, 13).
- Knowing **when a clean closed form exists vs. when to reduce to a numeric system**, and recognizing a **divergent** expectation (Part 14) rather than forcing a finite number.
- Stating assumptions (inclusion of stopping rolls, absorbing sets) before computing.
### Follow-up Questions
- In Part 5, the *expected sum* equals $\mathbb E[\#\text{rolls}]\times 3.5$, but does the *variance* of the sum factor as cleanly? Why or why not?
- For Part 7, how does the expected absorption time scale if the barrier is at $|x|\ge L$ for large $L$, and what does the diffusion approximation predict?
- Part 14 diverges; what *does* converge — e.g. is $\mathbb E[\log(\text{product})]$ finite, and what is it?
- In Part 17, why does the finite-$n$ answer $q(100)$ already match the renewal limit $2/21$ to many digits, and roughly how fast does $q(n)\to 2/21$?
Quick Answer: This question evaluates proficiency in probability theory, combinatorics, and mathematical expectation through a series of rapid-fire brain-teasers. It tests the ability to set up clean probabilistic models, apply the right technique, and derive exact closed-form answers under time pressure — skills commonly assessed in quantitative and data science interviews.