You are given records as tuples (country, region, actual_revenue, predicted_revenue). Build a rule-based advisor that flags when an ordinary bootstrap CI for pRMSE may be unreliable. Work only on valid rows with actual_revenue > 0 and predicted_revenue >= 0. Apply these rules: (1) if there are no valid rows, return ['insufficient_data']; (2) if the number of valid rows is less than min_valid, include 'collect_more_data'; (3) compute squared relative errors e_i = ((predicted - actual) / actual)^2, let m be their median, and if m = 0 with max(e_i) > 0, or if m > 0 and max(e_i) / m >= tail_ratio, include both 'winsorize_or_log_transform' and 'report_robust_metric'; (4) if any region contains valid rows from at least 2 distinct countries, include 'cluster_bootstrap_by_region'; (5) if there are at least 2 regions and the largest region contains strictly more than imbalance_threshold of the valid rows, include 'stratified_bootstrap_by_region'. Return all triggered recommendations sorted lexicographically.
Examples
Input: ([('US', 'NA', 100.0, 100.0), ('CA', 'NA', 100.0, 90.0), ('MX', 'NA', 100.0, 1000.0), ('FR', 'EU', 100.0, 100.0)],)
Expected Output: ['cluster_bootstrap_by_region', 'collect_more_data', 'report_robust_metric', 'stratified_bootstrap_by_region', 'winsorize_or_log_transform']
Explanation: There are only 4 valid rows, one extreme error dominates the median, and region NA is both repeated across countries and overly dominant.
Input: ([('A', 'R1', 100.0, 102.0), ('B', 'R2', 100.0, 98.0), ('C', 'R3', 100.0, 101.0), ('D', 'R4', 100.0, 99.0), ('E', 'R5', 100.0, 100.0)],)
Expected Output: []
Explanation: There are enough valid rows, no heavy-tail warning, no repeated region with multiple countries, and no region imbalance.
Input: ([('A', 'R1', 0.0, 10.0), ('B', 'R2', -1.0, 5.0), ('C', 'R3', 100.0, -2.0)],)
Expected Output: ['insufficient_data']
Explanation: All rows are invalid.
Input: ([('A', 'R1', 100.0, 80.0), ('B', 'R1', 0.0, 10.0)],)
Expected Output: ['collect_more_data']
Explanation: Only one valid row remains, which is too little data for a stable bootstrap analysis.
Solution
def solution(rows, min_valid=5, tail_ratio=25.0, imbalance_threshold=0.6):
valid = []
for country, region, actual, predicted in rows:
actual = float(actual)
predicted = float(predicted)
if actual > 0.0 and predicted >= 0.0:
valid.append((country, region, actual, predicted))
if not valid:
return ['insufficient_data']
recommendations = set()
n = len(valid)
if n < min_valid:
recommendations.add('collect_more_data')
errors = []
region_counts = {}
region_countries = {}
for country, region, actual, predicted in valid:
rel = (predicted - actual) / actual
errors.append(rel * rel)
region_counts[region] = region_counts.get(region, 0) + 1
region_countries.setdefault(region, set()).add(country)
errors.sort()
m = len(errors)
if m % 2 == 1:
median = errors[m // 2]
else:
median = (errors[m // 2 - 1] + errors[m // 2]) / 2.0
max_error = errors[-1]
if (median == 0.0 and max_error > 0.0) or (median > 0.0 and max_error / median >= tail_ratio):
recommendations.add('winsorize_or_log_transform')
recommendations.add('report_robust_metric')
for region, countries in region_countries.items():
if region_counts[region] >= 2 and len(countries) >= 2:
recommendations.add('cluster_bootstrap_by_region')
break
if len(region_counts) >= 2:
max_share = max(region_counts.values()) / n
if max_share > imbalance_threshold:
recommendations.add('stratified_bootstrap_by_region')
return sorted(recommendations)
Time complexity: O(n log n). Space complexity: O(n).