PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Data Scientist

Identify Bugs in Python Script for User Assignment

Company: OpenAI

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Take-home Project

##### Scenario A simple Python script assigns users to experiment groups and triggers the free-trial offer. ##### Question Inspect the script and list any bugs or data-quality risks you notice. Suggest concrete refactors or additional checks that would make the assignment and triggering logic safer and more reproducible. ##### Hints Think about seeding randomness, idempotency, logging and duplicate exposure safeguards.

Quick Answer: This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.

You are given a batch of user IDs, an experiment seed, and a treatment_rate (0..100). Assign a user to treatment if bucket(seed, user) < treatment_rate, otherwise control. For this problem, define bucket(seed, user) = (sum of ASCII codes of all characters in seed + '|' + user) % 100. Return, in the order of the input batch, the list of user IDs that should trigger a free-trial now, observing both rules: (1) Idempotency across calls: do not trigger users present in previously_triggered. (2) Duplicate-suppression within the batch: trigger at most once per user in the returned list even if the user appears multiple times in the input. Do not modify the inputs.

Constraints

  • 1 <= len(users) <= 200000
  • 0 <= treatment_rate <= 100
  • 0 <= len(previously_triggered) <= 200000
  • Each user_id is a non-empty ASCII string of length <= 64
  • Deterministic bucket: bucket(seed, user) = (sum(ord(c) for c in seed + '|' + user)) % 100
  • Return users in the order of first eligible appearance in the batch
  • Trigger at most once per user in a single batch

Solution

from typing import List


def schedule_triggers(users: List[str], seed: str, treatment_rate: int, previously_triggered: List[str]) -> List[str]:
    # Precompute base = sum(ord) for seed + '|'
    base = sum(ord(c) for c in seed) + ord('|')

    prior = set(previously_triggered)
    triggered_now = set()
    result: List[str] = []

    for u in users:
        # Skip if already triggered before or already triggered in this batch
        if u in prior or u in triggered_now:
            continue
        # Deterministic bucket per statement
        bucket = (base + sum(ord(c) for c in u)) % 100
        if bucket < treatment_rate:
            triggered_now.add(u)
            result.append(u)
    return result
Explanation
Compute a deterministic bucket using the ASCII-sum of seed, a separator '|', and the user ID, modulo 100. A user triggers if the bucket is below treatment_rate, provided they have not triggered before and have not been triggered already within this batch. Sets provide O(1) membership checks for idempotency and duplicate suppression, and the result preserves input order by appending on the first eligible occurrence.

Time complexity: O(n * L) where n is number of users and L is average user_id length (ASCII-sum). With memoization per distinct user this can be reduced, but not required.. Space complexity: O(m) for sets and output, where m is the number of unique users triggered or present in previously_triggered..

Hints

  1. Use a deterministic bucket function; avoid Python's built-in hash due to runtime randomization.
  2. Use a set for previously_triggered and for users triggered in this batch to enforce idempotency and duplicate suppression.
  3. Compute a reusable base from the seed once to avoid repeated work.
  4. Process users in order and append only on the first eligible occurrence.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)
  • Build a Compose Rating Card - OpenAI (medium)
  • Generate Data Labeling Schedules - OpenAI (medium)
  • Convert IPv4 Ranges to CIDR Blocks - OpenAI (medium)