Identify Bugs in Python Script for User Assignment
Company: OpenAI
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Take-home Project
##### Scenario
A simple Python script assigns users to experiment groups and triggers the free-trial offer.
##### Question
Inspect the script and list any bugs or data-quality risks you notice. Suggest concrete refactors or additional checks that would make the assignment and triggering logic safer and more reproducible.
##### Hints
Think about seeding randomness, idempotency, logging and duplicate exposure safeguards.
Quick Answer: This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.
You are given a batch of user IDs, an experiment seed, and a treatment_rate (0..100). Assign a user to treatment if bucket(seed, user) < treatment_rate, otherwise control. For this problem, define bucket(seed, user) = (sum of ASCII codes of all characters in seed + '|' + user) % 100. Return, in the order of the input batch, the list of user IDs that should trigger a free-trial now, observing both rules: (1) Idempotency across calls: do not trigger users present in previously_triggered. (2) Duplicate-suppression within the batch: trigger at most once per user in the returned list even if the user appears multiple times in the input. Do not modify the inputs.
Constraints
- 1 <= len(users) <= 200000
- 0 <= treatment_rate <= 100
- 0 <= len(previously_triggered) <= 200000
- Each user_id is a non-empty ASCII string of length <= 64
- Deterministic bucket: bucket(seed, user) = (sum(ord(c) for c in seed + '|' + user)) % 100
- Return users in the order of first eligible appearance in the batch
- Trigger at most once per user in a single batch
Hints
- Use a deterministic bucket function; avoid Python's built-in hash due to runtime randomization.
- Use a set for previously_triggered and for users triggered in this batch to enforce idempotency and duplicate suppression.
- Compute a reusable base from the seed once to avoid repeated work.
- Process users in order and append only on the first eligible occurrence.