Optimize invites under capacity constraints
Company: Capital One
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's competency in probabilistic expected-value optimization, algorithmic efficiency for large-scale inputs, and constrained combinatorial selection including grouped all-or-nothing decisions and cost modeling.
Constraints
- 0 <= n <= 100000
- All four probability/amount arrays have the same length n
- 0 <= p_online[i], p_gala[i] <= 1
- a_online[i], a_gala[i] >= 0
- 0 <= K (gala capacity); typically K = 100
- c_online, c_gala, F_gala >= 0
- The gala fixed cost F_gala is always paid (the gala is held regardless of |G|)
Examples
Input: ([0.5], [100], [0.8], [1000], 1.0, 100.0, 20000.0, 100)
Expected Output: -19300.0
Explanation: One donor. Online value = 0.5*100 - 1 = 49. Gala value = 0.8*1000 - 100 = 700, so delta = 651 > 0 and the donor is moved to the gala. base = 49 - 20000 = -19951; total = -19951 + 651 = -19300. The large fixed cost dominates.
Input: ([0.1, 0.1, 0.1], [10, 10, 10], [0.9, 0.9, 0.9], [1000, 1000, 1000], 1.0, 100.0, 0.0, 2)
Expected Output: 1600.0
Explanation: Each online value = 0.1*10 - 1 = 0; each gala value = 0.9*1000 - 100 = 800, so delta = 800. K = 2 caps the gala at two donors. base = 0; total = 0 + 800 + 800 = 1600. The third donor stays online despite a positive delta because of the capacity limit.
Input: ([0.5, 0.5, 0.5], [200, 200, 200], [0.1, 0.1, 0.1], [50, 50, 50], 2.0, 100.0, 500.0, 3)
Expected Output: -206.0
Explanation: Online value = 0.5*200 - 2 = 98; gala value = 0.1*50 - 100 = -95, so delta = -193 < 0 for everyone. No donor is moved to the gala. total = 3*98 - 500 = -206.
Input: ([], [], [], [], 1.0, 100.0, 1000.0, 100)
Expected Output: -1000.0
Explanation: Edge case: zero donors. base = -F_gala = -1000; no deltas to add. total = -1000.
Input: ([0.2, 0.6, 0.9, 0.3], [50, 80, 120, 40], [0.7, 0.4, 0.95, 0.5], [500, 600, 800, 300], 1.5, 80.0, 5000.0, 2)
Expected Output: -3993.0
Explanation: Online values = [8.5, 46.5, 106.5, 10.5], gala values = [270, 160, 680, 70], deltas = [261.5, 113.5, 573.5, 59.5]. base = 172 - 5000 = -4828. K = 2, so add the two largest positive deltas 573.5 + 261.5 = 835: total = -4828 + 835 = -3993.
Input: ([0.5, 0.5], [100, 100], [0.5, 0.5], [100, 100], 5.0, 5.0, 0.0, 2)
Expected Output: 90.0
Explanation: Symmetric channels: online and gala values are both 0.5*100 - 5 = 45, so every delta = 0. Since delta is not strictly positive, no donor is moved (it makes no difference). total = 45 + 45 = 90.
Hints
- Rank donors by Delta_i = (p_gala[i]*a_gala[i] - c_gala) - (p_online[i]*a_online[i] - c_online): this is the marginal gain of moving donor i from email to the gala.
- Since the only constraint is the count |G| <= K, the moves are independent — there is no interaction between donors — so a greedy pick of the top-K positive deltas is optimal.
- Compute the all-online baseline first (sum of online values minus F_gala), then add only the positive deltas among the top K. Never take a delta <= 0.
- Sorting the deltas is the O(n log n) bottleneck; you could also use a partial selection (e.g. a size-K heap) for O(n log K).