Identify Bugs in Python Script for User Assignment
Company: OpenAI
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Take-home Project
Quick Answer: This question evaluates a candidate's ability to detect bugs and data-quality risks in a Python script, testing competencies in code review, reproducible randomization, idempotency, logging, duplicate-exposure safeguards, and safe experiment-assignment logic.
Constraints
- 1 <= len(users) <= 200000
- 0 <= treatment_rate <= 100
- 0 <= len(previously_triggered) <= 200000
- Each user_id is a non-empty ASCII string of length <= 64
- Deterministic bucket: bucket(seed, user) = (sum(ord(c) for c in seed + '|' + user)) % 100
- Return users in the order of first eligible appearance in the batch
- Trigger at most once per user in a single batch
Solution
from typing import List
def schedule_triggers(users: List[str], seed: str, treatment_rate: int, previously_triggered: List[str]) -> List[str]:
# Precompute base = sum(ord) for seed + '|'
base = sum(ord(c) for c in seed) + ord('|')
prior = set(previously_triggered)
triggered_now = set()
result: List[str] = []
for u in users:
# Skip if already triggered before or already triggered in this batch
if u in prior or u in triggered_now:
continue
# Deterministic bucket per statement
bucket = (base + sum(ord(c) for c in u)) % 100
if bucket < treatment_rate:
triggered_now.add(u)
result.append(u)
return result
Explanation
Time complexity: O(n * L) where n is number of users and L is average user_id length (ASCII-sum). With memoization per distinct user this can be reduced, but not required.. Space complexity: O(m) for sets and output, where m is the number of unique users triggered or present in previously_triggered..
Hints
- Use a deterministic bucket function; avoid Python's built-in hash due to runtime randomization.
- Use a set for previously_triggered and for users triggered in this batch to enforce idempotency and duplicate suppression.
- Compute a reusable base from the seed once to avoid repeated work.
- Process users in order and append only on the first eligible occurrence.