Solve probability and game-theory puzzles
Company: Jane Street
Role: Data Scientist
Category: Statistics & Math
Difficulty: easy
Interview Round: Technical Screen
You are interviewing for a **quant trading internship**. Work through the following probability and game-theory puzzles. There is no need to answer instantly — taking a minute or two to reason or compute before committing to an answer is expected. For each puzzle, state your strategy and reasoning clearly, then give the requested numerical result where one is asked for.
The interviewer is evaluating *how you think*, not how fast you recall a fact. A clean argument — an upper bound, a posterior, a Bellman/stopping rule, an equilibrium concept — matters more than the final number.
### Clarifying Questions to Ask
- For the bottle game: do I get to choose my actions adaptively based on past removal-success feedback, or must I fix a plan in advance? (It affects whether the answer is a fixed split or a policy.)
- For the coin game: is the bias $p$ fixed for the whole sequence (drawn once from the prior), or re-drawn each flip? And do I observe each outcome *before* my next guess?
- For the die game: when I "stop," is my payout exactly the accumulated $S$, and does rolling a 1 wipe $S$ to $0$ with no further recourse?
- For the two-player die game: do I observe my opponent's running total or stopping decision, or are the sequences fully independent and hidden until the end?
- For the dice-bidding game: are bids restricted to strictly increasing integers, who bids first, and is the payoff purely binary (win/lose the challenge)?
### What a Strong Answer Covers
The interviewer is looking for these signals across the set — name the principle, then compute:
- **Upper/lower bounds before optimization** — can you cap what's achievable (e.g. a counting argument) before searching for the optimal strategy?
- **Bayesian updating** — recognizing a conjugate prior, writing the posterior, and using the posterior-predictive probability rather than a point estimate.
- **Optimal stopping / dynamic programming** — setting up a Bellman equation, finding the indifference threshold, and distinguishing a one-step look-ahead from the full value function.
- **Game theory under incomplete information** — shifting the objective from "maximize my expectation" to "maximize my chance of winning," reasoning about what an opponent's action *reveals*, and the idea of equilibrium (including when randomization/bluffing is required).
- **Numerical follow-through** — committing to a clean closed form or a defensible approximation, and sanity-checking it (limits, symmetry, trivial baselines).
---
## 1) Two bottles, add/remove balls (100 rounds)
You have **two bottles** (A and B), initially empty, and an unlimited supply of identical balls. The game lasts exactly **100 rounds**.
In each round you choose **exactly one** of two actions:
- **Add** — One ball is placed into a **uniformly random** bottle (A or B). You do **not** observe which bottle received the ball.
- **Remove** — One bottle is chosen **uniformly at random** (A or B). If that bottle is non-empty, you remove **one** ball and you observe that the removal succeeded. If that bottle is empty, you remove nothing.
The only feedback you ever get is **whether a removal attempt succeeded** — you never see which bottle was acted on, nor the contents of either bottle.
**Goal:** maximize the **expected number of balls removed** over the 100 rounds.
**Tasks:**
1. Describe an optimal strategy.
2. Compute the expected number of balls removed under that strategy.
```hint Find the ceiling first
Before optimizing, *bound* what's possible. With a fixed number of total rounds split between adds and removes, what two simple counting limits constrain how many balls you could ever remove? Treat the split as a variable and ask where the cap is largest.
```
```hint Ordering matters
Once you fix the split, think about whether adds and removes should be interleaved or separated. Which ordering tends to keep more inventory present *before* a removal is attempted, and why might removing early waste attempts?
```
```hint Computing the expectation
Try to model the contents of one bottle after the adds, and the removal attempts that happen to target that same bottle, as two random quantities of the same type. A failed attempt corresponds to an *imbalance* between attempts and inventory in a bottle — can you express total failures as a single absolute-difference quantity, then estimate its expectation?
```
---
## 2) 100 coin flips with unknown bias
A coin is flipped **100 times**. The probability of Heads is an unknown parameter \(p\), with prior \(p \sim \mathrm{Unif}(0,1)\).
**Before each flip**, you must guess Heads or Tails. You earn **\$1** for each correct guess (and \$0 for an incorrect one). You see the outcome of every flip after guessing it, so your guesses may depend on the flips seen so far.
**Tasks:**
1. What guessing strategy maximizes your expected total winnings?
2. Let \(X\) be the price to play, paid upfront. Treating yourself as **risk-neutral**, for what values of \(X\) would you be willing to play?
```hint Conjugate prior
A $\mathrm{Unif}(0,1)$ prior is a special case of a $\mathrm{Beta}$ prior. After observing some heads and tails, what is the posterior over $p$, and what does it imply for the probability that the *next* flip is Heads? There is a classical named result for exactly this quantity.
```
```hint What does scoring per-flip imply?
Each guess earns \$1 independently of the others, and you see the outcome before the next guess. Does maximizing the *total* expectation force you to plan ahead, or can it be decomposed flip-by-flip? Once you decide that, the per-flip decision follows from your posterior-predictive probability.
```
```hint Pricing
A risk-neutral player plays iff the price is below the expected payoff. So you need $\mathbb{E}[\text{winnings}]$ (a sum or estimate over the 100 flips). What does the per-flip accuracy approach as data accumulates, and how does the whole thing compare against a trivial data-ignoring baseline like "always guess Heads"?
```
---
## 3) Stop/continue with a 10-sided die (single-player and competitive)
A fair 10-sided die (faces 1–10) is rolled repeatedly. You keep a running profit \(S\), starting at \(S = 0\). On each roll:
- Roll **2–10**: that value is added to \(S\).
- Roll **1**: the game **ends immediately** and your final payout is **\$0** (your accumulated \(S\) is wiped out).
**Before each roll** (while the game is still going), you may instead choose to **stop** and take your current \(S\) as your payout.
**Tasks — single-player:**
Determine the optimal stopping rule.
```hint One-step look-ahead
Compare "stop now" against "roll exactly once more, then stop." Each continued roll has a known bust probability and, when it doesn't bust, adds a value with a known expectation. Write the value of rolling once in terms of your current total, and ask when it beats stopping.
```
```hint Why the threshold is exact
An extra roll trades a small chance of destroying your *entire* accumulated total against a likely gain. Set the expected gain from continuing equal to the expected loss to locate the indifference point. A full Bellman/backward-induction argument should confirm that this one-step boundary is the true stopping boundary — can you see why no extra "option value" shifts it here?
```
**Tasks — two-player competition:**
Two players each play this same game with their **own independent die sequence**, and each chooses when to stop. The player with the **higher final payout** is paid **their own** final payout by a third party; the loser receives **\$0**. (If the payouts tie, they split equally.) Describe how optimal play changes relative to the single-player case.
```hint The objective changed
You're no longer maximizing $\mathbb{E}[S]$ — you now care about *finishing ahead* of the opponent's final-score distribution. Re-derive what you're actually optimizing, and ask whether maximizing your mean and maximizing your chance of out-scoring the opponent point the same way or pull your threshold in different directions.
```
```hint Don't expect a clean pure threshold
Treat this as a game: best-respond to an assumed opponent threshold, then best-respond to *that*, and watch what the iteration does. Does it converge to one shared number, or not? Whatever you conclude about a single closed-form answer, the most defensible deliverable is the *direction* the competition pushes you relative to the single-player rule.
```
---
## 4) Bidding the sum of two dice with private information
Two fair six-sided dice are rolled, one assigned to each player.
- You see **only your** die outcome \(a\).
- Your opponent sees **only their** die outcome \(b\).
Players alternate turns in an **ascending-bid** game about the total sum \(a + b\):
- On your turn, you either **raise** to a strictly higher integer bid \(k \in \{2, \dots, 12\}\), or **challenge** the previous bid.
- A bid \(k\) is a claim that the true sum satisfies \(a + b \ge k\).
- When a player challenges, both dice are revealed:
- If \(a + b \ge k\), the **last bidder** (the one who made bid \(k\)) wins.
- Otherwise, the **challenger** wins.
**Task:** Describe an optimal strategy — how to decide whether to raise or challenge, and what value to raise to — as a function of your observed die value \(a\) and the bid history.
```hint Your posterior over the sum
You know $a$; the opponent's $b$ is uniform on $\{1,\dots,6\}$. From your seat, what is the distribution of the sum, and hence $\Pr(a+b \ge k \mid a)$ as a function of $k$? Identify the two extreme regimes of $k$ where the claim becomes certainly true or certainly false given your die — those bound what you can safely support.
```
```hint What does a binary payoff reward?
The payoff is purely win/lose, not proportional to the margin. For a payoff like that, what is the single comparison that decides whether an event is worth betting on at any node? Frame both "challenge the current bid" and "raise to a new bid" in terms of that comparison applied to your success probability.
```
```hint What does a raise tell you?
Ask why a rational opponent would voluntarily raise to a high number — what does that reveal about their hidden die? Should you keep using the flat prior on $b$ after seeing the bid history, or condition on it? Then turn it around: your own bids leak information too. What does that symmetry imply about whether a purely honest, fully-revealing strategy can be safe against a competent opponent?
```
---
### Follow-up Questions
After your main answers, expect deeper probes such as:
- **Bottle game:** how does the optimal split and expected count change if the game lasts $2N$ rounds instead of 100? What if removal feedback (success/failure) could actually be used adaptively — does it help?
- **Coin game:** how does the expected payoff change for a general $n$ flips, and what is the per-flip accuracy in the limit of large $n$? How would a *risk-averse* player adjust the price they'd pay?
- **Die game:** if you *could* observe the opponent's already-locked final score before finishing, what is the optimal rule? How does the two-player threshold respond as the opponent plays more or less aggressively?
- **Dice bidding:** roughly how much bluffing is required at equilibrium, and why does always-honest bidding lose to a competent opponent? How does the analysis extend if each player rolls two dice instead of one?
Quick Answer: This set of puzzles evaluates probabilistic reasoning, Bayesian inference, expected-value optimization, optimal stopping, and strategic game-theoretic reasoning under uncertainty and asymmetric information, situated in the Statistics & Math domain.