PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Glean

Reason about MAU error after user ID rehash

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's understanding of identity management, windowed distinct-count metrics (MAU), and the impact of a one-time ID remapping on measurement bias and data modeling.

  • easy
  • Glean
  • Analytics & Experimentation
  • Data Scientist

Reason about MAU error after user ID rehash

Company: Glean

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: easy

Interview Round: Technical Screen

A product tracks activity using `user_id` from login events, and computes MAU as: - **MAU (L30D)** on date `d` = number of **distinct `user_id`** with at least one login in the window `[d-29, d]` (inclusive). ### Data change event On a single day **T**, the company performs a one-time **rehash** of all user IDs: - For dates **< T**, events use the *old* `user_id_old`. - For dates **≥ T**, events use the *new* `user_id_new`. - Each real person gets exactly one new ID (a 1-to-1 remapping), but **your metric pipeline does not have the mapping** between old and new IDs. ### Questions 1) For dates whose L30D window overlaps both sides of **T**, how can this rehash bias the computed MAU if you naïvely count distinct `user_id`? 2) What is the **maximum possible MAU overestimate** (as a percentage) and the **minimum possible MAU overestimate** (as a percentage), relative to the true number of distinct real users in the window? 3) Operationally, how would you redesign tracking/warehouse modeling to make MAU robust to this type of ID change?

Quick Answer: This question evaluates a data scientist's understanding of identity management, windowed distinct-count metrics (MAU), and the impact of a one-time ID remapping on measurement bias and data modeling.

Related Interview Questions

  • How to measure product success? - Glean (easy)
Glean logo
Glean
Nov 10, 2025, 12:00 AM
Data Scientist
Technical Screen
Analytics & Experimentation
4
0

A product tracks activity using user_id from login events, and computes MAU as:

  • MAU (L30D) on date d = number of distinct user_id with at least one login in the window [d-29, d] (inclusive).

Data change event

On a single day T, the company performs a one-time rehash of all user IDs:

  • For dates < T , events use the old user_id_old .
  • For dates ≥ T , events use the new user_id_new .
  • Each real person gets exactly one new ID (a 1-to-1 remapping), but your metric pipeline does not have the mapping between old and new IDs.

Questions

  1. For dates whose L30D window overlaps both sides of T , how can this rehash bias the computed MAU if you naïvely count distinct user_id ?
  2. What is the maximum possible MAU overestimate (as a percentage) and the minimum possible MAU overestimate (as a percentage), relative to the true number of distinct real users in the window?
  3. Operationally, how would you redesign tracking/warehouse modeling to make MAU robust to this type of ID change?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Glean•More Data Scientist•Glean Data Scientist•Glean Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.