Reason about MAU error after user ID rehash
Company: Glean
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
A product tracks activity using `user_id` from login events, and computes MAU as:
- **MAU (L30D)** on date `d` = number of **distinct `user_id`** with at least one login in the window `[d-29, d]` (inclusive).
### Data change event
On a single day **T**, the company performs a one-time **rehash** of all user IDs:
- For dates **< T**, events use the *old* `user_id_old`.
- For dates **≥ T**, events use the *new* `user_id_new`.
- Each real person gets exactly one new ID (a 1-to-1 remapping), but **your metric pipeline does not have the mapping** between old and new IDs.
### Questions
1) For dates whose L30D window overlaps both sides of **T**, how can this rehash bias the computed MAU if you naïvely count distinct `user_id`?
2) What is the **maximum possible MAU overestimate** (as a percentage) and the **minimum possible MAU overestimate** (as a percentage), relative to the true number of distinct real users in the window?
3) Operationally, how would you redesign tracking/warehouse modeling to make MAU robust to this type of ID change?
Quick Answer: This question evaluates a data scientist's understanding of identity management, windowed distinct-count metrics (MAU), and the impact of a one-time ID remapping on measurement bias and data modeling.