PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Data Scientist

Identify Bugs in Python Script for User Assignment

Company: OpenAI

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Take-home Project

##### Scenario A simple Python script assigns users to experiment groups and triggers the free-trial offer. ##### Question Inspect the script and list any bugs or data-quality risks you notice. Suggest concrete refactors or additional checks that would make the assignment and triggering logic safer and more reproducible. ##### Hints Think about seeding randomness, idempotency, logging and duplicate exposure safeguards.

Quick Answer: This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.

You are given a batch of user IDs, an experiment seed, and a treatment_rate (0..100). Assign a user to treatment if bucket(seed, user) < treatment_rate, otherwise control. For this problem, define bucket(seed, user) = (sum of ASCII codes of all characters in seed + '|' + user) % 100. Return, in the order of the input batch, the list of user IDs that should trigger a free-trial now, observing both rules: (1) Idempotency across calls: do not trigger users present in previously_triggered. (2) Duplicate-suppression within the batch: trigger at most once per user in the returned list even if the user appears multiple times in the input. Do not modify the inputs.

Constraints

  • 1 <= len(users) <= 200000
  • 0 <= treatment_rate <= 100
  • 0 <= len(previously_triggered) <= 200000
  • Each user_id is a non-empty ASCII string of length <= 64
  • Deterministic bucket: bucket(seed, user) = (sum(ord(c) for c in seed + '|' + user)) % 100
  • Return users in the order of first eligible appearance in the batch
  • Trigger at most once per user in a single batch

Hints

  1. Use a deterministic bucket function; avoid Python's built-in hash due to runtime randomization.
  2. Use a set for previously_triggered and for users triggered in this batch to enforce idempotency and duplicate suppression.
  3. Compute a reusable base from the seed once to avoid repeated work.
  4. Process users in order and append only on the first eligible occurrence.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Infection Spread Simulation with Death Threshold - OpenAI (medium)
  • Spreading Contagion on a Grid - OpenAI (medium)
  • Streaming Entropy with Numerical Stability - OpenAI (hard)
  • Implement a Distributed Rate Limiter - OpenAI (medium)
  • Compute Plant Infection Time - OpenAI (medium)