Fermi Estimation with Confidence Intervals: How Many Houses Does One Year of US Netflix Spending Buy?
Fermi Estimation with Confidence Intervals: How Many Houses Does One Year of US Netflix Spending Buy?
Company: Optiver
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Technical Screen
Estimate how many houses could be purchased with the total amount that people in the United States spend on Netflix subscriptions in one year. You start with no reference data — build the estimate from quantities you can reason about, and be explicit about every assumption.
This is a Fermi-estimation question from a quantitative interview round. The interviewer cares less about the final number than about your decomposition, your calibration, and how you update when given new information.
### Constraints & Assumptions
- "Spending on Netflix" means consumer subscription payments by US subscribers over one calendar year (ignore advertising revenue and corporate accounts).
- "Houses" means typical US homes at the national median price; assume the purchases are at market price with no bulk discount.
- No external data sources: all inputs must be common-knowledge magnitudes you estimate and justify on the spot.
- A 90% interval means you would accept a bet at 9-to-1 odds that the true value falls inside it.
### Clarifying Questions to Ask
- Should I count only direct subscription revenue from US customers, or include advertising and other revenue lines?
- Is a "house" the median-priced US home, or homes in a specific market?
- Do you want a point estimate first and then the interval, or the interval directly?
- How precise should the arithmetic be — order of magnitude, or a defensible point value with clean rounding?
### Part 1
Produce a point estimate for the number of houses, and give an interval that you are 90% confident contains the true answer.
```hint Decompose into estimable factors
Annual US Netflix spend $=$ (US subscribers) $\times$ (average monthly price per subscription) $\times$ 12. Divide by the median US home price. Each factor is something you can bound from everyday knowledge.
```
```hint Intervals for a product
Your uncertainty is multiplicative, so work in log space: if the factors are independent and roughly log-normal, the log-variances add, and the 90% interval is the point estimate multiplied and divided by $e^{1.645\,\sigma_{\text{total}}}$. Do not just multiply the three most extreme values together — that overshoots 90%.
```
#### What This Part Should Cover
- A clean decomposition into a small number of independently estimable factors.
- Explicit per-factor assumptions with plausible ranges, avoiding false precision.
- An interval built by propagating multiplicative uncertainty (log-space reasoning), not by ad-hoc widening.
- Sanity checks: per-household monthly spend, total spend versus the scale of US consumer spending, and whether the implied number of houses is a sensible fraction of annual US home sales.
### Part 2
You are now told that Netflix has about 81 million US subscribers. Update your point estimate and your 90% interval. Should the interval get wider or narrower, and roughly by how much?
```hint Which uncertainty disappears
Your interval width came from several independent sources of uncertainty whose log-variances add. Conditioning on the exact subscriber count removes exactly one of those variance terms — recompute the total.
```
#### What This Part Should Cover
- Correctly recomputing the point estimate with the given subscriber figure.
- Recognizing that the interval must narrow, because one component of variance is eliminated while the others are unchanged.
- Quantifying the reduction approximately (new total log-standard-deviation from the remaining terms).
- A calibration check: does 81M fall inside the subscriber range assumed in Part 1, and what would it mean if it did not?
### What a Strong Answer Covers
Across both parts: genuine calibration (a 90% interval that is neither timidly wide nor overconfidently narrow); a clear separation between the point estimate and the uncertainty around it; transparent mental arithmetic with clean rounding; and correct Bayesian instincts — new information moves the point estimate within the old interval and shrinks the uncertainty rather than being bolted on ad hoc.
### Follow-up Questions
- Make a market on this quantity: what bid and ask would you quote, and how do they relate to your 90% interval?
- Of the assumptions that remain after Part 2, which single number would you pay the most to learn exactly, and why (value of information)?
- How would the answer change if "houses" meant homes in San Francisco rather than the national median?
- If you answered 100 questions like this, how would you test whether your 90% intervals are actually calibrated?
Estimate how many houses could be purchased with the total amount that people in the United States spend on Netflix subscriptions in one year. You start with no reference data — build the estimate from quantities you can reason about, and be explicit about every assumption.
This is a Fermi-estimation question from a quantitative interview round. The interviewer cares less about the final number than about your decomposition, your calibration, and how you update when given new information.
Constraints & Assumptions
"Spending on Netflix" means consumer subscription payments by US subscribers over one calendar year (ignore advertising revenue and corporate accounts).
"Houses" means typical US homes at the national median price; assume the purchases are at market price with no bulk discount.
No external data sources: all inputs must be common-knowledge magnitudes you estimate and justify on the spot.
A 90% interval means you would accept a bet at 9-to-1 odds that the true value falls inside it.
Clarifying Questions to Ask
Should I count only direct subscription revenue from US customers, or include advertising and other revenue lines?
Is a "house" the median-priced US home, or homes in a specific market?
Do you want a point estimate first and then the interval, or the interval directly?
How precise should the arithmetic be — order of magnitude, or a defensible point value with clean rounding?
Part 1
Produce a point estimate for the number of houses, and give an interval that you are 90% confident contains the true answer.
What This Part Should Cover
A clean decomposition into a small number of independently estimable factors.
Explicit per-factor assumptions with plausible ranges, avoiding false precision.
An interval built by propagating multiplicative uncertainty (log-space reasoning), not by ad-hoc widening.
Sanity checks: per-household monthly spend, total spend versus the scale of US consumer spending, and whether the implied number of houses is a sensible fraction of annual US home sales.
Part 2
You are now told that Netflix has about 81 million US subscribers. Update your point estimate and your 90% interval. Should the interval get wider or narrower, and roughly by how much?
What This Part Should Cover
Correctly recomputing the point estimate with the given subscriber figure.
Recognizing that the interval must narrow, because one component of variance is eliminated while the others are unchanged.
Quantifying the reduction approximately (new total log-standard-deviation from the remaining terms).
A calibration check: does 81M fall inside the subscriber range assumed in Part 1, and what would it mean if it did not?
What a Strong Answer Covers
Across both parts: genuine calibration (a 90% interval that is neither timidly wide nor overconfidently narrow); a clear separation between the point estimate and the uncertainty around it; transparent mental arithmetic with clean rounding; and correct Bayesian instincts — new information moves the point estimate within the old interval and shrinks the uncertainty rather than being bolted on ad hoc.
Follow-up Questions
Make a market on this quantity: what bid and ask would you quote, and how do they relate to your 90% interval?
Of the assumptions that remain after Part 2, which single number would you pay the most to learn exactly, and why (value of information)?
How would the answer change if "houses" meant homes in San Francisco rather than the national median?
If you answered 100 questions like this, how would you test whether your 90% intervals are actually calibrated?