Estimate billboard reach and impressions
Company: Pinterest
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: Onsite
A single digital billboard sits beside a 6-lane urban expressway. You must estimate weekly unique reach (people who saw it at least once) and total impressions, and then extend to estimate conversion to store visits using a simple Markov chain model.
Given data and assumptions:
- Average weekday vehicles/day: 80,000; weekend: 50,000. Average occupants/vehicle: 1.4.
- Visibility probability per pass depends on lane and time: p_vis = 0.75 (daytime), 0.55 (night), weighted by 70% daytime traffic.
- Share of traffic by segment: locals living within 3 km (30%), commuters passing ≥4 weekdays (50%), occasional passers (20%).
- For commuters, passes/week ~ Poisson(λ=5). For locals, passes/week ~ Poisson(λ=2). For occasional, passes/week ~ Poisson(λ=1).
- Deduplicate unique people using mobile location panel of 120,000 devices/week within 500 m; device-to-person expansion factor: 2.2; panel capture rate uncertainty ±10% (1σ).
Tasks:
1) Estimate weekly unique reach and total impressions with 95% CIs, clearly stating all formulas and independence assumptions. Show how you combine traffic counts, occupancy, visibility, and pass frequency to compute impressions, and how you deduplicate to people-level using the panel (include the capture-rate uncertainty via delta method or bootstrap).
2) Using a 3-state Markov chain (Unaware → Aware → Visit), propose reasonable transition probabilities by segment and compute expected visits/week attributable to the billboard. Discuss sensitivity of results to these probabilities and to p_vis.
3) Identify at least three major bias sources (e.g., panel selection, deduplication error, dwell-time bias) and propose corrections/validations.
Quick Answer: This question evaluates probabilistic modeling and statistical inference skills applied to audience measurement and attribution, covering Poisson-based frequency modeling, visibility-adjusted impressions, panel-based deduplication and expansion, uncertainty propagation (delta method/bootstrap), and Markov-chain attribution within the Statistics & Math / Data Science domain. It is commonly asked to test the ability to convert traffic and visibility inputs into quantitative reach and impressions with propagated confidence intervals, reason about attribution via a 3-state Markov chain, identify measurement biases, and demonstrates both conceptual understanding and practical application through numerical estimation and sensitivity analysis.