PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's competency in probabilistic expected-value optimization, algorithmic efficiency for large-scale inputs, and constrained combinatorial selection including grouped all-or-nothing decisions and cost modeling.

  • Medium
  • Capital One
  • Coding & Algorithms
  • Data Scientist

Optimize invites under capacity constraints

Company: Capital One

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

You have n donors (n up to 100,000). For each donor i you know: p_online[i] (probability of donating if emailed), a_online[i] (expected donation conditional on donating), p_gala[i], a_gala[i]. Costs: c_online per person emailed, c_gala per gala attendee, plus gala fixed cost F_gala. You may email everyone, but you can invite at most K=100 donors to the gala; a donor cannot both be emailed and invited to the gala. Choose a set G (|G| ≤ K) to maximize expected total net revenue: sum_{i∈G}(p_gala[i]*a_gala[i] - c_gala) + sum_{i∉G}(p_online[i]*a_online[i] - c_online) - F_gala. Tasks: 1) Describe an O(n log n) algorithm to choose G. Provide correctness intuition and complexity. (Hint: rank donors by Δi = (p_gala*a_gala - c_gala) - (p_online*a_online - c_online) and take the top K with positive Δi.) 2) Extend your approach if some donors belong to households that must be invited as a group (all-or-nothing) with varying group sizes; state how this becomes a 0/1 knapsack variant and propose an algorithm or approximation suitable for n=1e5. 3) Explain how you'd incorporate uncertainty in p and a (e.g., prediction intervals) to produce a risk-aware selection.

Quick Answer: This question evaluates a candidate's competency in probabilistic expected-value optimization, algorithmic efficiency for large-scale inputs, and constrained combinatorial selection including grouped all-or-nothing decisions and cost modeling.

A nonprofit can reach each of `n` donors either by **email (online)** or by **inviting them to a gala**, but not both. For each donor `i` you are given four parallel arrays: `p_online[i]` (probability of donating if emailed), `a_online[i]` (expected gift amount conditional on donating online), `p_gala[i]`, and `a_gala[i]` (the gala analogues). There are per-person costs `c_online` (per emailed donor) and `c_gala` (per gala attendee), plus a one-time gala fixed cost `F_gala`. You may email everyone, but the gala venue holds at most `K` donors, so you may invite at most `K` donors to the gala. Let the gala be held regardless (so `F_gala` is always paid). Choose a set `G` with `|G| <= K` to maximize the expected total net revenue: sum_{i in G} (p_gala[i]*a_gala[i] - c_gala) + sum_{i not in G} (p_online[i]*a_online[i] - c_online) - F_gala Return the maximum achievable expected total net revenue, rounded to 6 decimal places. **Approach (O(n log n)):** Start from the baseline where every donor is emailed: `base = sum_i (p_online[i]*a_online[i] - c_online) - F_gala`. Moving donor `i` from email to the gala changes the objective by `delta_i = (p_gala[i]*a_gala[i] - c_gala) - (p_online[i]*a_online[i] - c_online)`. Because the only constraint is the count `|G| <= K`, the moves are independent: greedily apply the moves with the largest positive `delta_i`, up to `K` of them, stopping once `delta_i <= 0`. Sorting the deltas dominates at O(n log n).

Constraints

  • 0 <= n <= 100000
  • All four probability/amount arrays have the same length n
  • 0 <= p_online[i], p_gala[i] <= 1
  • a_online[i], a_gala[i] >= 0
  • 0 <= K (gala capacity); typically K = 100
  • c_online, c_gala, F_gala >= 0
  • The gala fixed cost F_gala is always paid (the gala is held regardless of |G|)

Examples

Input: ([0.5], [100], [0.8], [1000], 1.0, 100.0, 20000.0, 100)

Expected Output: -19300.0

Explanation: One donor. Online value = 0.5*100 - 1 = 49. Gala value = 0.8*1000 - 100 = 700, so delta = 651 > 0 and the donor is moved to the gala. base = 49 - 20000 = -19951; total = -19951 + 651 = -19300. The large fixed cost dominates.

Input: ([0.1, 0.1, 0.1], [10, 10, 10], [0.9, 0.9, 0.9], [1000, 1000, 1000], 1.0, 100.0, 0.0, 2)

Expected Output: 1600.0

Explanation: Each online value = 0.1*10 - 1 = 0; each gala value = 0.9*1000 - 100 = 800, so delta = 800. K = 2 caps the gala at two donors. base = 0; total = 0 + 800 + 800 = 1600. The third donor stays online despite a positive delta because of the capacity limit.

Input: ([0.5, 0.5, 0.5], [200, 200, 200], [0.1, 0.1, 0.1], [50, 50, 50], 2.0, 100.0, 500.0, 3)

Expected Output: -206.0

Explanation: Online value = 0.5*200 - 2 = 98; gala value = 0.1*50 - 100 = -95, so delta = -193 < 0 for everyone. No donor is moved to the gala. total = 3*98 - 500 = -206.

Input: ([], [], [], [], 1.0, 100.0, 1000.0, 100)

Expected Output: -1000.0

Explanation: Edge case: zero donors. base = -F_gala = -1000; no deltas to add. total = -1000.

Input: ([0.2, 0.6, 0.9, 0.3], [50, 80, 120, 40], [0.7, 0.4, 0.95, 0.5], [500, 600, 800, 300], 1.5, 80.0, 5000.0, 2)

Expected Output: -3993.0

Explanation: Online values = [8.5, 46.5, 106.5, 10.5], gala values = [270, 160, 680, 70], deltas = [261.5, 113.5, 573.5, 59.5]. base = 172 - 5000 = -4828. K = 2, so add the two largest positive deltas 573.5 + 261.5 = 835: total = -4828 + 835 = -3993.

Input: ([0.5, 0.5], [100, 100], [0.5, 0.5], [100, 100], 5.0, 5.0, 0.0, 2)

Expected Output: 90.0

Explanation: Symmetric channels: online and gala values are both 0.5*100 - 5 = 45, so every delta = 0. Since delta is not strictly positive, no donor is moved (it makes no difference). total = 45 + 45 = 90.

Hints

  1. Rank donors by Delta_i = (p_gala[i]*a_gala[i] - c_gala) - (p_online[i]*a_online[i] - c_online): this is the marginal gain of moving donor i from email to the gala.
  2. Since the only constraint is the count |G| <= K, the moves are independent — there is no interaction between donors — so a greedy pick of the top-K positive deltas is optimal.
  3. Compute the all-online baseline first (sum of online values minus F_gala), then add only the positive deltas among the top K. Never take a delta <= 0.
  4. Sorting the deltas is the O(n log n) bottleneck; you could also use a partial selection (e.g. a size-K heap) for O(n log K).
Last updated: Jun 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Solve Four Coding Assessment Tasks - Capital One (medium)
  • Write SQL using joins and window functions - Capital One (medium)
  • Review Preprocessing Code and Tests - Capital One (easy)
  • Remove nodes with a given value - Capital One (medium)
  • Solve multiple algorithmic interview questions - Capital One (hard)