Privacy-Conscious Measurement and Differential Privacy
Asked of: Data Scientist
Last updated

-
What it is Privacy‑conscious measurement means learning useful aggregate signals (e.g., ad lift, conversion rates) while minimizing exposure of raw user data. Differential privacy is a formal framework that adds calibrated noise so results don’t depend meaningfully on any one person.
-
Why interviewers ask about it Large platforms must measure product impact and ads performance under tighter privacy constraints, cookie deprecation, and regulation. Teams expect Data Scientists to reason about trade‑offs, design guardrails, and choose methods (e.g., DP, secure aggregation, MPC) that preserve utility without leaking user data.
-
Core ideas to know
- Epsilon (and delta) quantify privacy loss; smaller epsilon = stronger privacy, more noise.
- Sensitivity drives noise scale; get it wrong and you either leak or destroy utility.
- Composition: repeated queries consume privacy budget; track and allocate budgets explicitly.
- Central vs. local DP: curator‑added noise vs. client‑side noise; utility differs markedly.
- Post‑processing invariance: any transformation of a DP output remains DP.
- Secure aggregation/MPC protect inputs during compute; DP limits what the output can reveal.
- Real systems add thresholds and delays to reduce re‑identification from sparse events.
-
A common pitfall Candidates conflate cryptography with DP and assume encryption alone solves leakage from outputs. Others pick epsilon arbitrarily, ignore query sensitivity, or forget composition across dashboards and slices. Many propose “aggregate then add a tiny bit of noise,” which fails on sparse cohorts and enables linkage attacks. Strong answers name specific mechanisms (Laplace/Gaussian), budget management, and when to combine DP with MPC or secure aggregation for incrementality testing.
-
Further reading
- US Census: Understanding Differential Privacy — clear, non‑marketing primer plus real deployment details from the 2020 Census. https://www.census.gov/programs-surveys/decennial-census/decade/2020/planning-management/process/disclosure-avoidance/differential-privacy.html (census.gov)
- Google Privacy Sandbox: Attribution Reporting (summary reports) — concrete example of browser‑side, noised aggregated measurement with timing delays and reporting limits. https://privacysandbox.google.com/private-advertising/attribution-reporting/summary-reports (privacysandbox.google.com)
- Meta Private Lift Measurement (MPC Deployments) — how incrementality is measured using multi‑party computation; useful for understanding DP’s role alongside cryptography. https://mpc.cs.berkeley.edu/posts/Private-Lift-Measurement-for-Private-Advertising/ (mpc.cs.berkeley.edu)