How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Take-home Project rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at OpenAI during technical interviews.

Identify Bugs in Python Script for User Assignment

Quick Overview

This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.

Quick Overview

Identify Bugs in Python Script for User Assignment

Company: OpenAI

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Take-home Project

##### Scenario A simple Python script assigns users to experiment groups and triggers the free-trial offer. ##### Question Inspect the script and list any bugs or data-quality risks you notice. Suggest concrete refactors or additional checks that would make the assignment and triggering logic safer and more reproducible. ##### Hints Think about seeding randomness, idempotency, logging and duplicate exposure safeguards.

Quick Answer: This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.

You are given a batch of user IDs, an experiment seed, and a treatment_rate (0..100). Assign a user to treatment if bucket(seed, user) < treatment_rate, otherwise control. For this problem, define bucket(seed, user) = (sum of ASCII codes of all characters in seed + '|' + user) % 100. Return, in the order of the input batch, the list of user IDs that should trigger a free-trial now, observing both rules: (1) Idempotency across calls: do not trigger users present in previously_triggered. (2) Duplicate-suppression within the batch: trigger at most once per user in the returned list even if the user appears multiple times in the input. Do not modify the inputs.

Constraints

1 <= len(users) <= 200000
0 <= treatment_rate <= 100
0 <= len(previously_triggered) <= 200000
Each user_id is a non-empty ASCII string of length <= 64
Deterministic bucket: bucket(seed, user) = (sum(ord(c) for c in seed + '|' + user)) % 100
Return users in the order of first eligible appearance in the batch
Trigger at most once per user in a single batch

Hints

Use a deterministic bucket function; avoid Python's built-in hash due to runtime randomization.
Use a set for previously_triggered and for users triggered in this batch to enforce idempotency and duplicate suppression.
Compute a reusable base from the seed once to avoid repeated work.
Process users in order and append only on the first eligible occurrence.

Last updated: Mar 29, 2026

Loading coding console...