A product tracks activity using user_id from login events, and computes MAU as:
-
MAU (L30D)
on date
d
= number of
distinct user_id
with at least one login in the window
[d-29, d]
(inclusive).
Data change event
On a single day T, the company performs a one-time rehash of all user IDs:
-
For dates
< T
, events use the
old
user_id_old
.
-
For dates
≥ T
, events use the
new
user_id_new
.
-
Each real person gets exactly one new ID (a 1-to-1 remapping), but
your metric pipeline does not have the mapping
between old and new IDs.
Questions
-
For dates whose L30D window overlaps both sides of
T
, how can this rehash bias the computed MAU if you naïvely count distinct
user_id
?
-
What is the
maximum possible MAU overestimate
(as a percentage) and the
minimum possible MAU overestimate
(as a percentage), relative to the true number of distinct real users in the window?
-
Operationally, how would you redesign tracking/warehouse modeling to make MAU robust to this type of ID change?