You are given integer data split across worker nodes. Implement an exact coordinator that computes the global mode and median, but also chooses the cheapest valid communication strategy under a simplified byte-cost model. Your function must consider three strategies: (1) raw: every worker sends every integer; (2) sparse_counts: every worker sends one (value, count) pair for each distinct local value; (3) dense_histogram: if a value_range = (lo, hi) is provided and all data lies inside it, each worker sends one count bucket for every value in that inclusive range. Use these assumptions: every stored integer costs 4 bytes to read; every raw integer sent costs 4 bytes; every sparse (value, count) pair costs 8 bytes; every dense histogram bucket costs 4 bytes. All workers read in parallel, so read_seconds = max bytes read by any single worker / 10. The coordinator is the network bottleneck, so network_seconds = total sent bytes / 1. Return the exact mode and median, the chosen strategy, and the estimated costs. If multiple values are tied for mode, return the smallest such value. If the total element count is even, the median is the average of the two middle values; if that average is an integer, return it as an int. If the dataset is empty, return None for both mode and median. If two strategies send the same number of bytes, prefer dense_histogram over sparse_counts over raw.
Examples
Input: ([[1, 1, 1, 1, 2, 2], [2, 2, 2, 3, 3, 3], [1, 1, 3, 3, 3, 3]], (1, 3))
Expected Output: {'mode': 3, 'median': 2, 'strategy': 'dense_histogram', 'sent_bytes': 36, 'read_seconds': 2.4, 'network_seconds': 36, 'total_seconds': 38.4}
Explanation: Small known range makes a dense histogram cheaper than sending raw values or sparse maps.
Input: ([[1000, 1000, 1000, 1000, 1000, 1000, 1001], [1001, 1001, 1001, 1001, 1001, 1001, 1002], [1002, 1002, 1002, 1002, 1002, 1002]], None)
Expected Output: {'mode': 1001, 'median': 1001, 'strategy': 'sparse_counts', 'sent_bytes': 40, 'read_seconds': 2.8, 'network_seconds': 40, 'total_seconds': 42.8}
Explanation: The values are large and repeated, so sparse local counts are much cheaper than shipping all raw integers.
Input: ([[1, 2, 3], [4, 5], [6]], None)
Expected Output: {'mode': 1, 'median': 3.5, 'strategy': 'raw', 'sent_bytes': 24, 'read_seconds': 1.2, 'network_seconds': 24, 'total_seconds': 25.2}
Explanation: All values are unique, so sparse summaries are more expensive than just sending the raw integers.
Input: ([[], [], []], None)
Expected Output: {'mode': None, 'median': None, 'strategy': 'sparse_counts', 'sent_bytes': 0, 'read_seconds': 0.0, 'network_seconds': 0, 'total_seconds': 0.0}
Explanation: Edge case: no worker has any data.
Solution
from collections import Counter\n\ndef solution(partitions, value_range=None):\n def median_from_counts(counts):\n total = sum(counts.values())\n if total == 0:\n return None\n\n left_pos = (total + 1) // 2\n right_pos = (total + 2) // 2\n running = 0\n left_val = None\n right_val = None\n\n for value in sorted(counts):\n running += counts[value]\n if left_val is None and running >= left_pos:\n left_val = value\n if right_val is None and running >= right_pos:\n right_val = value\n break\n\n if left_val == right_val:\n return left_val\n\n avg = (left_val + right_val) / 2\n if avg.is_integer():\n return int(avg)\n return avg\n\n total_counts = Counter()\n max_read_bytes = 0\n raw_bytes = 0\n sparse_bytes = 0\n\n dense_valid = isinstance(value_range, tuple) and len(value_range) == 2 and value_range[0] <= value_range[1]\n if dense_valid:\n lo, hi = value_range\n range_size = hi - lo + 1\n\n for part in partitions:\n local_counts = Counter(part)\n total_counts.update(local_counts)\n\n local_bytes = len(part) * 4\n raw_bytes += local_bytes\n max_read_bytes = max(max_read_bytes, local_bytes)\n sparse_bytes += len(local_counts) * 8\n\n if dense_valid:\n for value in local_counts:\n if value < lo or value > hi:\n dense_valid = False\n break\n\n candidates = [\n (raw_bytes, 2, 'raw'),\n (sparse_bytes, 1, 'sparse_counts')\n ]\n\n if dense_valid:\n dense_bytes = len(partitions) * range_size * 4\n candidates.append((dense_bytes, 0, 'dense_histogram'))\n\n sent_bytes, _, strategy = min(candidates, key=lambda item: (item[0], item[1]))\n\n if not total_counts:\n mode = None\n median = None\n else:\n max_freq = max(total_counts.values())\n mode = min(value for value, freq in total_counts.items() if freq == max_freq)\n median = median_from_counts(total_counts)\n\n read_seconds = round(max_read_bytes / 10.0, 1)\n network_seconds = sent_bytes\n total_seconds = round(read_seconds + network_seconds, 1)\n\n return {\n 'mode': mode,\n 'median': median,\n 'strategy': strategy,\n 'sent_bytes': sent_bytes,\n 'read_seconds': read_seconds,\n 'network_seconds': network_seconds,\n 'total_seconds': total_seconds\n }Time complexity: O(N + U log U), where N is the total number of integers and U is the number of distinct values globally. Space complexity: O(U).