Implement percentage RMSE and bootstrap its CI
Company: Google
Role: Data Scientist
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates implementation and statistical reasoning skills, focusing on robust numerical computation of percentage RMSE (including handling zeros/negatives, optional country weights, and numerical stability) and nonparametric bootstrap methods for confidence intervals.
Part 1: Implement percentage RMSE with filtering, weights, and numerical stability
Constraints
- 0 <= len(rows) <= 200000
- -10^15 <= actual_revenue, predicted_revenue <= 10^15
- Weights may be omitted; missing countries default to weight 1.0
- Rows with actual_revenue <= 0, predicted_revenue < 0, or weight <= 0 must be ignored
Examples
Input: ([('US', 100.0, 110.0), ('CA', 200.0, 180.0)], None)
Expected Output: 0.1
Explanation: Both rows have relative error magnitude 0.1, so pRMSE = sqrt((0.01 + 0.01) / 2) = 0.1.
Input: ([('A', 100.0, 120.0), ('B', 100.0, 100.0)], {'A': 3.0, 'B': 1.0})
Expected Output: 0.173205080757
Explanation: Weighted mean square error is (3*0.04 + 1*0.0) / 4 = 0.03, so pRMSE = sqrt(0.03).
Input: ([('A', 0.0, 10.0), ('B', -5.0, 10.0), ('C', 50.0, -1.0), ('D', 100.0, 120.0)], {'D': 0.0})
Expected Output: None
Explanation: Every row is invalid after applying the filtering rules.
Input: ([('X', 1000000000000.0, 1000000000001.0), ('Y', 1000000000000.0, 999999999999.0)], None)
Expected Output: 1e-12
Explanation: The two relative errors are +1e-12 and -1e-12, giving a pRMSE of exactly 1e-12.
Solution
def solution(rows, weights=None):
import math
if weights is None:
weights = {}
weighted_sq_errors = []
weight_values = []
for country, actual, predicted in rows:
w = float(weights.get(country, 1.0))
actual = float(actual)
predicted = float(predicted)
if actual <= 0.0 or predicted < 0.0 or w <= 0.0:
continue
rel = (predicted - actual) / actual
weighted_sq_errors.append(w * rel * rel)
weight_values.append(w)
total_weight = math.fsum(weight_values)
if total_weight == 0.0:
return None
mean_sq = math.fsum(weighted_sq_errors) / total_weight
return round(math.sqrt(mean_sq), 12)
Time complexity: O(n). Space complexity: O(n).
Hints
- The expression predicted / actual - 1 is algebraically equal to (predicted - actual) / actual, but the second form is numerically safer when predicted is very close to actual.
- Filter invalid rows first, then compute a weighted mean of squared relative errors.
Part 2: Bootstrap a 95% CI for pRMSE with deterministic resampling
Constraints
- 0 <= len(rows) <= 5000
- 1 <= B <= 5000
- 0 <= seed <= 10^9
- Use n-of-n resampling, where n is the count of valid rows after filtering
- Rows with actual_revenue <= 0, predicted_revenue < 0, or weight <= 0 must be ignored before bootstrapping
Examples
Input: ([('A', 100.0, 100.0), ('B', 100.0, 200.0)], 4, 2, None)
Expected Output: (0.707107, 0.0, 1.0)
Explanation: The original pRMSE is sqrt((0^2 + 1^2)/2) = 0.707107. With the specified generator and B = 4, the bootstrap estimates are [1.0, 0.0, 1.0, 0.707107].
Input: ([('A', 100.0, 110.0), ('B', 200.0, 220.0)], 6, 5, None)
Expected Output: (0.1, 0.1, 0.1)
Explanation: Every valid row has the same relative error 0.1, so every bootstrap sample also has pRMSE 0.1.
Input: ([('A', 0.0, 5.0), ('B', 100.0, 120.0), ('C', 100.0, -1.0)], 5, 7, None)
Expected Output: (0.2, 0.2, 0.2)
Explanation: Only one valid row remains after filtering, so every bootstrap sample is identical.
Input: ([('A', 0.0, 1.0), ('B', -2.0, 3.0)], 4, 1, None)
Expected Output: None
Explanation: No row survives the validity checks.
Solution
def solution(rows, B, seed, weights=None):
import math
if weights is None:
weights = {}
valid = []
for country, actual, predicted in rows:
w = float(weights.get(country, 1.0))
actual = float(actual)
predicted = float(predicted)
if actual > 0.0 and predicted >= 0.0 and w > 0.0:
valid.append((country, actual, predicted, w))
n = len(valid)
if n == 0 or B <= 0:
return None
def prmse(sample):
weighted_sq_errors = []
weight_values = []
for _, actual, predicted, w in sample:
rel = (predicted - actual) / actual
weighted_sq_errors.append(w * rel * rel)
weight_values.append(w)
total_weight = math.fsum(weight_values)
return math.sqrt(math.fsum(weighted_sq_errors) / total_weight)
estimate = prmse(valid)
state = seed % 97
boot = []
for _ in range(B):
sample = []
for _ in range(n):
state = (17 * state + 43) % 97
idx = state % n
sample.append(valid[idx])
boot.append(prmse(sample))
boot.sort()
low_idx = max(0, int(math.floor(0.025 * B)))
high_idx = min(B - 1, int(math.ceil(0.975 * B)) - 1)
return (round(estimate, 6), round(boot[low_idx], 6), round(boot[high_idx], 6))
Time complexity: O(B * n). Space complexity: O(B + n).
Hints
- Resample from the filtered valid rows, not from the raw input.
- Store all bootstrap estimates, sort them, then pick the two required order-statistic positions.
Part 3: Exact probability that a bootstrap resample matches the original sample in order
Constraints
- 0 <= n <= 100
- For n > 0, the exact probability is 1 / n^n
- The fraction should be reduced
Examples
Input: (3,)
Expected Output: (1, 27)
Explanation: There are 3^3 = 27 ordered resamples of length 3, and exactly one matches the original sample in order.
Input: (1,)
Expected Output: (1, 1)
Explanation: With one element, the only possible bootstrap draw matches the original.
Input: (0,)
Expected Output: (1, 1)
Explanation: The empty sample has exactly one resample: itself.
Input: (5,)
Expected Output: (1, 3125)
Explanation: The probability is 1 / 5^5.
Solution
def solution(n):
if n < 0:
return None
if n == 0:
return (1, 1)
return (1, pow(n, n))
Time complexity: O(log n) arithmetic operations for exponentiation. Space complexity: O(1) auxiliary space.
Hints
- Each of the n positions in the resample must pick one specific original item.
- There is exactly one successful ordered resample out of n^n equally likely ordered resamples when n > 0.
Part 4: Detect bootstrap risks for pRMSE and recommend mitigations
Constraints
- 0 <= len(rows) <= 200000
- Use only valid rows with actual_revenue > 0 and predicted_revenue >= 0
- Recommendation labels must be unique and sorted lexicographically
- Median is the middle value after sorting, or the average of the two middle values for an even number of items
Examples
Input: ([('US', 'NA', 100.0, 100.0), ('CA', 'NA', 100.0, 90.0), ('MX', 'NA', 100.0, 1000.0), ('FR', 'EU', 100.0, 100.0)],)
Expected Output: ['cluster_bootstrap_by_region', 'collect_more_data', 'report_robust_metric', 'stratified_bootstrap_by_region', 'winsorize_or_log_transform']
Explanation: There are only 4 valid rows, one extreme error dominates the median, and region NA is both repeated across countries and overly dominant.
Input: ([('A', 'R1', 100.0, 102.0), ('B', 'R2', 100.0, 98.0), ('C', 'R3', 100.0, 101.0), ('D', 'R4', 100.0, 99.0), ('E', 'R5', 100.0, 100.0)],)
Expected Output: []
Explanation: There are enough valid rows, no heavy-tail warning, no repeated region with multiple countries, and no region imbalance.
Input: ([('A', 'R1', 0.0, 10.0), ('B', 'R2', -1.0, 5.0), ('C', 'R3', 100.0, -2.0)],)
Expected Output: ['insufficient_data']
Explanation: All rows are invalid.
Input: ([('A', 'R1', 100.0, 80.0), ('B', 'R1', 0.0, 10.0)],)
Expected Output: ['collect_more_data']
Explanation: Only one valid row remains, which is too little data for a stable bootstrap analysis.
Solution
def solution(rows, min_valid=5, tail_ratio=25.0, imbalance_threshold=0.6):
valid = []
for country, region, actual, predicted in rows:
actual = float(actual)
predicted = float(predicted)
if actual > 0.0 and predicted >= 0.0:
valid.append((country, region, actual, predicted))
if not valid:
return ['insufficient_data']
recommendations = set()
n = len(valid)
if n < min_valid:
recommendations.add('collect_more_data')
errors = []
region_counts = {}
region_countries = {}
for country, region, actual, predicted in valid:
rel = (predicted - actual) / actual
errors.append(rel * rel)
region_counts[region] = region_counts.get(region, 0) + 1
region_countries.setdefault(region, set()).add(country)
errors.sort()
m = len(errors)
if m % 2 == 1:
median = errors[m // 2]
else:
median = (errors[m // 2 - 1] + errors[m // 2]) / 2.0
max_error = errors[-1]
if (median == 0.0 and max_error > 0.0) or (median > 0.0 and max_error / median >= tail_ratio):
recommendations.add('winsorize_or_log_transform')
recommendations.add('report_robust_metric')
for region, countries in region_countries.items():
if region_counts[region] >= 2 and len(countries) >= 2:
recommendations.add('cluster_bootstrap_by_region')
break
if len(region_counts) >= 2:
max_share = max(region_counts.values()) / n
if max_share > imbalance_threshold:
recommendations.add('stratified_bootstrap_by_region')
return sorted(recommendations)
Time complexity: O(n log n). Space complexity: O(n).
Hints
- Start by filtering valid rows; every later rule should use the filtered data only.
- Heavy tails can be detected by comparing the largest squared relative error against the median, while dependence and imbalance come from region counts and distinct country counts.