PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to design and analyze dynamic data structures that support weighted random sampling with inserts and deletes, testing skills in algorithm design, complexity analysis, and randomized sampling under large numeric constraints.

  • medium
  • Citadel
  • Coding & Algorithms
  • Software Engineer

Design dynamic weighted random sampling with updates

Company: Citadel

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Support weighted random sampling with insert/delete Design a data structure that maintains a dynamic set of items, each with a **positive integer weight**, and supports: 1. `insert(id, weight)` - Adds a new item `id` with the given `weight`. - If `id` already exists, you may either reject or treat it as an update (state your choice). 2. `delete(id)` - Removes the item if it exists. 3. `sample() -> id` - Returns an `id` chosen randomly such that: \[ P(id) = \frac{weight(id)}{\sum_j weight(j)} \] ### Requirements - Target time complexity: **O(log n)** per operation. - Must handle up to ~10^5 operations. - Weights can be large (e.g., up to 10^9); total weight may exceed 32-bit. - `sample()` should be unbiased given an ideal RNG. ### Notes - You may assume access to a function `randInt(1, totalWeight)` that returns a uniform integer in that range. - Clearly define how you map item IDs to internal indices as the set changes over time.

Quick Answer: This question evaluates a candidate's ability to design and analyze dynamic data structures that support weighted random sampling with inserts and deletes, testing skills in algorithm design, complexity analysis, and randomized sampling under large numeric constraints.

You are given a sequence of operations on a dynamic set of integer item IDs. Each active item has a positive integer weight. Implement a function that processes all operations and returns the result of every sample query. Operations: - ("insert", id, weight): Insert a new item or update the existing item's weight to the new positive value. - ("delete", id): Remove the item if it exists. If it does not exist, do nothing. - ("sample", r): Simulate weighted random sampling using a pre-generated random integer r. To make sampling deterministic for testing, active items are considered in increasing ID order. If their current weights are w1, w2, ..., then they occupy cumulative intervals [1, w1], [w1+1, w1+w2], and so on. A sample query returns the ID whose interval contains r. If the set is empty, or r is outside the range [1, totalWeight], return -1 for that sample. Your goal is to support up to about 10^5 operations efficiently.

Constraints

  • 1 <= len(operations) <= 100000
  • 1 <= weight <= 10^9 for insert operations
  • IDs are integers and may be sparse or large in magnitude
  • The total active weight can exceed 32-bit integer range
  • Insert on an existing ID must be treated as a weight update
  • Delete on a missing ID is a no-op

Examples

Input: [("insert", 10, 4), ("insert", 20, 6), ("sample", 1), ("sample", 4), ("sample", 5), ("sample", 10)]

Expected Output: [10, 10, 20, 20]

Explanation: In increasing ID order, 10 owns [1,4] and 20 owns [5,10].

Input: [("insert", 5, 3), ("insert", 2, 2), ("sample", 4), ("insert", 5, 1), ("sample", 2), ("delete", 2), ("sample", 1), ("delete", 5), ("sample", 1)]

Expected Output: [5, 2, 5, -1]

Explanation: Insert on existing ID updates its weight. After all deletions, sampling from an empty set returns -1.

Input: [("delete", 7), ("insert", 7, 1000000000), ("sample", 1), ("sample", 1000000000), ("insert", 3, 1000000000), ("sample", 1000000000), ("sample", 1000000001)]

Expected Output: [7, 7, 3, 7]

Explanation: This checks large weights and boundary values. After inserting 3, IDs are ordered as 3 then 7.

Input: [("insert", 1, 5), ("insert", 4, 2), ("insert", 3, 4), ("delete", 4), ("sample", 8), ("insert", 4, 10), ("sample", 9), ("insert", 1, 1), ("sample", 1), ("sample", 5)]

Expected Output: [3, 3, 1, 3]

Explanation: This case mixes inserts, updates, deletes, and boundary prefix sums.

Input: [("sample", 1), ("sample", 2)]

Expected Output: [-1, -1]

Explanation: Sampling from an always-empty structure returns -1 each time.

Solution

def solution(operations):
    ids = sorted({op[1] for op in operations if op[0] in ("insert", "delete")})
    n = len(ids)
    index = {item_id: i + 1 for i, item_id in enumerate(ids)}

    bit = [0] * (n + 1)
    current = {}
    total_weight = 0

    def add(i, delta):
        nonlocal total_weight
        total_weight += delta
        while i <= n:
            bit[i] += delta
            i += i & -i

    def find_by_prefix(k):
        idx = 0
        step = 1 << (n.bit_length() - 1)
        while step:
            nxt = idx + step
            if nxt <= n and bit[nxt] < k:
                idx = nxt
                k -= bit[nxt]
            step >>= 1
        return idx + 1

    answer = []

    for op in operations:
        typ = op[0]

        if typ == "insert":
            _, item_id, weight = op
            old = current.get(item_id, 0)
            current[item_id] = weight
            add(index[item_id], weight - old)

        elif typ == "delete":
            _, item_id = op
            old = current.pop(item_id, 0)
            if old:
                add(index[item_id], -old)

        else:  # sample
            _, r = op
            if total_weight <= 0 or r < 1 or r > total_weight:
                answer.append(-1)
            else:
                pos = find_by_prefix(r)
                answer.append(ids[pos - 1])

    return answer

Time complexity: O(m log m) overall for m operations, with O(log m) per update/sample after coordinate compression. Space complexity: O(m).

Hints

  1. Compress item IDs into fixed indices, then store current weights in a Fenwick tree or segment tree.
  2. A sample query is equivalent to finding the first index whose prefix-sum weight is at least r.
Last updated: Apr 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a single-producer multi-consumer ring buffer - Citadel (medium)
  • Sort a Nearly Sorted Array - Citadel (hard)
  • Compute BBO and NBBO from order data - Citadel (medium)
  • Compute maximum later-earlier difference - Citadel (medium)
  • Implement LRU/LFU cache with custom eviction - Citadel (easy)