Analyze ad targeting expectations and distributions
Company: Meta
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: HR Screen
You operate an ads slot with two user segments: High-Intent (H, 90% of traffic) and Low-Intent (L, 10% of traffic). If shown an ad:
- H clicks with probability 0.30; given a click, converts with probability 0.40.
- L clicks with probability 0.05; given a click, converts with probability 0.10.
- Revenue per conversion = $10. Cost per impression = $0.002 (i.e., $2 CPM). Assume independence across impressions.
Answer:
1) Compute expected profit per 1,000 impressions for three strategies: (S1) show to everyone; (S2) show only to predicted-High users where your classifier has 95% precision and 80% recall on H vs. L; (S3) show only when a user’s posterior P(convert) exceeds a threshold t. Derive the profit-maximizing threshold t* and give its numeric value under these economics.
2) For S1, compute the variance of the number of conversions per 1,000 impressions. Show your decomposition across the H/L mixture (law of total variance). State whether the mixture increases or decreases overdispersion vs. a single Bernoulli with the average conversion rate.
3) Your PM proposes “send all impressions to High-Intent only.” List two quantitative pros and two cons using your results (e.g., impact on reach, profit sensitivity to precision/recall errors).
4) Exponential inter-arrival model: Assume user sessions arrive as a Poisson process with rate λ = 0.2 per minute.
a) Write the PDF and CDF of T ~ Exp(λ). Compute E[T], Var(T), and P(T > 10).
b) Let T̄_n be the sample mean of n IID draws from Exp(λ). State the limit of T̄_n as n → ∞ and the approximate distribution of √n (T̄_n − 1/λ) for large n (name the theorem). Briefly relate this to why large-sample estimates of average time-spent stabilize in dashboards.
5) If real traffic is a 90/10 H/L mixture with different click propensities, is the inter-click-time distribution exponential? If not, name the resulting family qualitatively and one diagnostic you would plot to detect the mixture.
Quick Answer: This question evaluates a data scientist's competency in probabilistic modeling, expected-value decision-making, variance decomposition for mixture distributions, and Poisson/exponential arrival-process analysis within the Statistics & Math domain.