Reason about MAU error after user ID rehash

Q: Reason about MAU error after user ID rehash

This is a Analytics & Experimentation interview question from Glean for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

A product tracks activity using user_id from login events, and computes MAU as:

MAU (L30D) on date d = number of distinct user_id with at least one login in the window [d-29, d] (inclusive).

Data change event

On a single day T, the company performs a one-time rehash of all user IDs:

For dates < T , events use the old user_id_old .
For dates ≥ T , events use the new user_id_new .
Each real person gets exactly one new ID (a 1-to-1 remapping), but your metric pipeline does not have the mapping between old and new IDs.

Questions

For dates whose L30D window overlaps both sides of T , how can this rehash bias the computed MAU if you naïvely count distinct user_id ?
What is the maximum possible MAU overestimate (as a percentage) and the minimum possible MAU overestimate (as a percentage), relative to the true number of distinct real users in the window?
Operationally, how would you redesign tracking/warehouse modeling to make MAU robust to this type of ID change?

Reason about MAU error after user ID rehash

Data change event

Questions

Solution

Comments (0)