Streaming Entropy with Numerical Stability
Company: OpenAI
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question tests a candidate's ability to implement numerically stable online algorithms for computing Shannon entropy over streaming data. It evaluates mastery of floating-point precision, O(1) incremental state maintenance, and information-theoretic concepts applied to unbounded sequences — skills central to machine learning engineering roles.
Streaming Entropy (Single-Label Updates)
Constraints
- Entropy of the empty stream is 0.0.
- Entropy is 0.0 whenever all mass is on a single category.
- Each update() is O(1) amortized, independent of the number of distinct categories.
- Each entropy() is O(1) and must not re-scan the per-category counts.
- A category with count 0 contributes 0 (limit of p ln p as p -> 0).
- Results must be within 1e-6 nats of the reference, including at n ~ 1e9.
Examples
Input: ([['entropy']],)
Expected Output: [0.0]
Explanation: Empty stream: entropy of nothing is defined as 0.0.
Input: ([['update', 'x'], ['update', 'x'], ['update', 'x'], ['entropy']],)
Expected Output: [0.0]
Explanation: All mass on one category -> entropy 0 (ln 1 = 0).
Input: ([['update', 'a'], ['update', 'b'], ['entropy']],)
Expected Output: [0.693147181]
Explanation: Two categories each with count 1 -> H = ln 2 = 0.6931... nats.
Input: ([['entropy'], ['update', 'a'], ['entropy'], ['update', 'b'], ['update', 'b'], ['entropy'], ['update', 'c'], ['entropy']],)
Expected Output: [0.0, 0.0, 0.636514168, 1.039720771]
Explanation: Progressive updates; entropy queried at several points along the stream.
Input: ([['update', 'red'], ['update', 'red'], ['update', 'blue'], ['update', 'green'], ['update', 'green'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'red'], ['update', 'green'], ['update', 'yellow'], ['update', 'green'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'yellow'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'green'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'blue'], ['update', 'blue'], ['update', 'blue'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'red'], ['update', 'yellow'], ['update', 'blue'], ['update', 'yellow'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'red'], ['update', 'blue'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'blue'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'red'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'blue'], ['update', 'blue'], ['update', 'red'], ['update', 'blue'], ['update', 'yellow'], ['update', 'green'], ['update', 'yellow'], ['update', 'red'], ['update', 'blue'], ['update', 'green'], ['update', 'red'], ['update', 'blue'], ['update', 'green'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'red'], ['update', 'blue'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'red'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'green'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'blue'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'red'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'green'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'blue'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'green'], ['update', 'yellow'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'green'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'green'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'green'], ['update', 'yellow'], ['update', 'red'], ['update', 'green'], ['update', 'yellow'], ['update', 'red'], ['update', 'yellow'], ['update', 'blue'], ['update', 'yellow'], ['update', 'blue'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'green'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'yellow'], ['update', 'red'], ['update', 'green'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'blue'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'green'], ['update', 'blue'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'yellow'], ['update', 'blue'], ['update', 'red'], ['update', 'red'], ['update', 'blue'], ['update', 'green'], ['update', 'blue'], ['update', 'green'], ['update', 'yellow'], ['update', 'yellow'], ['update', 'red'], ['update', 'red'], ['entropy']],)
Expected Output: [1.378517711]
Explanation: 300 updates over 4 labels; final empirical entropy near ln 4.
Hints
- Rewrite H = -sum (c_i/n) ln(c_i/n) = ln(n) - (1/n) sum c_i ln(c_i). Maintain only S = sum c_i ln c_i and the total n; then entropy is ln(n) - S/n in O(1).
- On update of a category from count c to c+1, S changes by (c+1)ln(c+1) - c ln c — an O(1) delta. No need to touch any other category.
- Naive double accumulation of S drifts past 1e-6 by n~1e9. Use Neumaier/Kahan compensated summation (keep a running compensation term) so the many small +/- deltas don't lose low-order bits.
Streaming Entropy (Block-Wise / Batched Updates)
Constraints
- update_block(block) incorporates a k-category block in O(k) time.
- Entropy after any sequence of blocks equals the entropy of the merged multiset of all labels.
- Single update() stays O(1); entropy() stays O(1) and must not re-scan counts.
- Empty stream -> 0.0; all mass on one label -> 0.0.
- Results within 1e-6 nats of the reference, including totals up to ~1e9.
Examples
Input: ([['entropy']],)
Expected Output: [0.0]
Explanation: Empty stream -> 0.0, even with block API available.
Input: ([['update_block', {'x': 1000000}], ['entropy']],)
Expected Output: [0.0]
Explanation: A single block puts all mass on one label -> entropy 0.
Input: ([['entropy'], ['update_block', {'a': 1000, 'b': 3, 'c': 50000}], ['entropy'], ['update', 'a'], ['update_block', {'d': 2, 'a': 7}], ['entropy'], ['update_block', {'b': 1}], ['entropy']],)
Expected Output: [0.0, 0.09713507, 0.098169038, 0.098355111]
Explanation: Interleaved blocks and a single update; entropy equals that of the merged multiset.
Input: ([['update', 'a'], ['update', 'a'], ['update_block', {'a': 1, 'b': 3}], ['entropy']],)
Expected Output: [0.693147181]
Explanation: Single updates then a block touching the same label; result matches streaming all labels one at a time.
Input: ([['update_block', {'cat0': 1179126, 'cat1': 816353, 'cat2': 1328004, 'cat3': 1865108, 'cat4': 601263, 'cat5': 651909, 'cat6': 1623826, 'cat7': 697405, 'cat8': 1266905, 'cat9': 1722195, 'cat10': 621632, 'cat11': 1564169, 'cat12': 950254, 'cat13': 578634, 'cat14': 680244, 'cat15': 1409420, 'cat16': 1376970, 'cat17': 646497, 'cat18': 1004706, 'cat19': 690238, 'cat20': 1655629, 'cat21': 1390281, 'cat22': 623963, 'cat23': 1685842, 'cat24': 759631, 'cat25': 968166, 'cat26': 1822518, 'cat27': 1815822, 'cat28': 1722633, 'cat29': 629734, 'cat30': 1710272, 'cat31': 1727969, 'cat32': 1331899, 'cat33': 603996, 'cat34': 963642, 'cat35': 597690, 'cat36': 1667410, 'cat37': 779287, 'cat38': 1107354, 'cat39': 1378998, 'cat40': 802524, 'cat41': 1633900, 'cat42': 747028, 'cat43': 1697292, 'cat44': 1146933, 'cat45': 1674944, 'cat46': 1930263, 'cat47': 879010, 'cat48': 716123, 'cat49': 1719703, 'cat50': 1697902, 'cat51': 1839898, 'cat52': 893994, 'cat53': 1280974, 'cat54': 704326, 'cat55': 1648703, 'cat56': 1993404, 'cat57': 631678, 'cat58': 1683566, 'cat59': 624992, 'cat60': 1798157, 'cat61': 931926, 'cat62': 1541056, 'cat63': 1926902, 'cat64': 1615098, 'cat65': 1396726, 'cat66': 1158814, 'cat67': 1476437, 'cat68': 1728012, 'cat69': 1450396, 'cat70': 1258293, 'cat71': 1128656, 'cat72': 1020988, 'cat73': 876998, 'cat74': 1965897, 'cat75': 1011907, 'cat76': 671662, 'cat77': 1704653, 'cat78': 1129668, 'cat79': 1601416, 'cat80': 1538334, 'cat81': 1220320, 'cat82': 1441273, 'cat83': 1103849, 'cat84': 1777079, 'cat85': 653513, 'cat86': 747601, 'cat87': 1573600, 'cat88': 1376867, 'cat89': 845950, 'cat90': 1217343, 'cat91': 818734, 'cat92': 1525429, 'cat93': 1384365, 'cat94': 582223, 'cat95': 1901350, 'cat96': 662781, 'cat97': 1670369, 'cat98': 1701722, 'cat99': 1157976, 'cat100': 1213288, 'cat101': 1958140, 'cat102': 1234377, 'cat103': 1746483, 'cat104': 1541602, 'cat105': 1716128, 'cat106': 1456731, 'cat107': 644206, 'cat108': 696285, 'cat109': 1066103, 'cat110': 1494256, 'cat111': 1961803, 'cat112': 1892828, 'cat113': 636314, 'cat114': 627233, 'cat115': 1971135, 'cat116': 1149293, 'cat117': 1857127, 'cat118': 1712041, 'cat119': 1928657, 'cat120': 1434576, 'cat121': 1096840, 'cat122': 1309063, 'cat123': 1902266, 'cat124': 1227722, 'cat125': 547317, 'cat126': 1468245, 'cat127': 1245462, 'cat128': 852422, 'cat129': 1781191, 'cat130': 745567, 'cat131': 1535349, 'cat132': 623636, 'cat133': 957614, 'cat134': 1102788, 'cat135': 771246, 'cat136': 1019285, 'cat137': 1334451, 'cat138': 1319880, 'cat139': 1541250, 'cat140': 668991, 'cat141': 848895, 'cat142': 1442014, 'cat143': 1342309, 'cat144': 1652259, 'cat145': 1082670, 'cat146': 787154, 'cat147': 1402869, 'cat148': 1653894, 'cat149': 1083891, 'cat150': 1981421, 'cat151': 1370939, 'cat152': 1252397, 'cat153': 1931774, 'cat154': 1297843, 'cat155': 983920, 'cat156': 816504, 'cat157': 674031, 'cat158': 869555, 'cat159': 817295, 'cat160': 986448, 'cat161': 1881009, 'cat162': 989341, 'cat163': 525298, 'cat164': 1517040, 'cat165': 1735481, 'cat166': 882400, 'cat167': 1051019, 'cat168': 1091251, 'cat169': 508584, 'cat170': 805505, 'cat171': 1378594, 'cat172': 1621118, 'cat173': 1274380, 'cat174': 1778869, 'cat175': 1687703, 'cat176': 1168177, 'cat177': 763174, 'cat178': 1948070, 'cat179': 1581063, 'cat180': 1795185, 'cat181': 1873564, 'cat182': 1918094, 'cat183': 613231, 'cat184': 1457651, 'cat185': 1927269, 'cat186': 1672877, 'cat187': 1322878, 'cat188': 1334812, 'cat189': 1336719, 'cat190': 1326529, 'cat191': 717133, 'cat192': 1509826, 'cat193': 1830201, 'cat194': 1339789, 'cat195': 630543, 'cat196': 899737, 'cat197': 641238, 'cat198': 937808, 'cat199': 1424061, 'cat200': 840374, 'cat201': 730536, 'cat202': 1213144, 'cat203': 1759816, 'cat204': 610259, 'cat205': 714705, 'cat206': 500489, 'cat207': 1688631, 'cat208': 817225, 'cat209': 1625370, 'cat210': 712786, 'cat211': 1262545, 'cat212': 1787100, 'cat213': 553479, 'cat214': 647462, 'cat215': 936108, 'cat216': 1787796, 'cat217': 1289010, 'cat218': 811532, 'cat219': 1830453, 'cat220': 1029022, 'cat221': 1228528, 'cat222': 1763071, 'cat223': 1263706, 'cat224': 1494367, 'cat225': 757618, 'cat226': 741913, 'cat227': 1523552, 'cat228': 1477250, 'cat229': 1507461, 'cat230': 1514674, 'cat231': 1154001, 'cat232': 680113, 'cat233': 802236, 'cat234': 714302, 'cat235': 1218559, 'cat236': 1055235, 'cat237': 1503742, 'cat238': 1951348, 'cat239': 838561, 'cat240': 1582831, 'cat241': 548435, 'cat242': 930367, 'cat243': 1607836, 'cat244': 1258649, 'cat245': 807447, 'cat246': 1947176, 'cat247': 1639115, 'cat248': 556712, 'cat249': 1607525, 'cat250': 1125139, 'cat251': 1848294, 'cat252': 690862, 'cat253': 1960030, 'cat254': 1047598, 'cat255': 1587157, 'cat256': 1269025, 'cat257': 850312, 'cat258': 1245948, 'cat259': 967230, 'cat260': 1616927, 'cat261': 1635748, 'cat262': 1554232, 'cat263': 1191357, 'cat264': 1834715, 'cat265': 967752, 'cat266': 1786032, 'cat267': 909250, 'cat268': 1002032, 'cat269': 1340296, 'cat270': 975507, 'cat271': 919258, 'cat272': 1585567, 'cat273': 1533438, 'cat274': 1245668, 'cat275': 560775, 'cat276': 558588, 'cat277': 1085983, 'cat278': 1490359, 'cat279': 1043528, 'cat280': 906102, 'cat281': 1952323, 'cat282': 1769068, 'cat283': 1222009, 'cat284': 1437904, 'cat285': 1232995, 'cat286': 1264696, 'cat287': 668900, 'cat288': 962343, 'cat289': 714239, 'cat290': 975730, 'cat291': 1485829, 'cat292': 912522, 'cat293': 1208286, 'cat294': 928602, 'cat295': 1512197, 'cat296': 1808762, 'cat297': 1779812, 'cat298': 504002, 'cat299': 1505528, 'cat300': 1869394, 'cat301': 1221434, 'cat302': 1848747, 'cat303': 677793, 'cat304': 1885348, 'cat305': 751456, 'cat306': 1314818, 'cat307': 1992108, 'cat308': 918003, 'cat309': 1502507, 'cat310': 874387, 'cat311': 1410006, 'cat312': 1833457, 'cat313': 1197339, 'cat314': 681927, 'cat315': 1330133, 'cat316': 1471318, 'cat317': 1341769, 'cat318': 678088, 'cat319': 833145, 'cat320': 856523, 'cat321': 766418, 'cat322': 557774, 'cat323': 816985, 'cat324': 1739023, 'cat325': 1475917, 'cat326': 1875434, 'cat327': 806549, 'cat328': 1782562, 'cat329': 1749630, 'cat330': 1494798, 'cat331': 1878391, 'cat332': 1234857, 'cat333': 826972, 'cat334': 1650623, 'cat335': 1649838, 'cat336': 774693, 'cat337': 544872, 'cat338': 529869, 'cat339': 1862466, 'cat340': 715528, 'cat341': 1604320, 'cat342': 792029, 'cat343': 1409764, 'cat344': 908536, 'cat345': 942587, 'cat346': 558707, 'cat347': 1028135, 'cat348': 946231, 'cat349': 1114395, 'cat350': 1551012, 'cat351': 1004447, 'cat352': 1729847, 'cat353': 1183649, 'cat354': 1043927, 'cat355': 1641590, 'cat356': 1378733, 'cat357': 774881, 'cat358': 627726, 'cat359': 1241938, 'cat360': 1460833, 'cat361': 1889310, 'cat362': 1723371, 'cat363': 1583726, 'cat364': 1382121, 'cat365': 1552034, 'cat366': 774230, 'cat367': 1615317, 'cat368': 818423, 'cat369': 1597872, 'cat370': 1570694, 'cat371': 539226, 'cat372': 1423008, 'cat373': 884005, 'cat374': 1776231, 'cat375': 508247, 'cat376': 814158, 'cat377': 861437, 'cat378': 796870, 'cat379': 1492986, 'cat380': 1798349, 'cat381': 752364, 'cat382': 1667013, 'cat383': 629510, 'cat384': 1183634, 'cat385': 1930952, 'cat386': 1587056, 'cat387': 1613013, 'cat388': 1664846, 'cat389': 1511848, 'cat390': 722527, 'cat391': 1675026, 'cat392': 619165, 'cat393': 1021130, 'cat394': 901198, 'cat395': 1080737, 'cat396': 588497, 'cat397': 704986, 'cat398': 1564753, 'cat399': 1448281, 'cat400': 1678031, 'cat401': 558438, 'cat402': 632894, 'cat403': 1429559, 'cat404': 1182861, 'cat405': 1784564, 'cat406': 1560221, 'cat407': 1771162, 'cat408': 1574080, 'cat409': 918178, 'cat410': 1952762, 'cat411': 1081300, 'cat412': 1448637, 'cat413': 1565680, 'cat414': 1618380, 'cat415': 1502514, 'cat416': 1564832, 'cat417': 1019371, 'cat418': 1966366, 'cat419': 1597250, 'cat420': 1044404, 'cat421': 1673385, 'cat422': 924858, 'cat423': 1438534, 'cat424': 787591, 'cat425': 1373751, 'cat426': 755059, 'cat427': 1322847, 'cat428': 1427188, 'cat429': 1162657, 'cat430': 652140, 'cat431': 1907514, 'cat432': 1004656, 'cat433': 1398291, 'cat434': 653345, 'cat435': 946042, 'cat436': 1903984, 'cat437': 1134975, 'cat438': 756586, 'cat439': 823898, 'cat440': 1849429, 'cat441': 1884658, 'cat442': 1267942, 'cat443': 799848, 'cat444': 1030805, 'cat445': 787843, 'cat446': 1480913, 'cat447': 960509, 'cat448': 697395, 'cat449': 1335205, 'cat450': 1521859, 'cat451': 841406, 'cat452': 1900547, 'cat453': 969158, 'cat454': 838618, 'cat455': 1981267, 'cat456': 1404966, 'cat457': 1581302, 'cat458': 1346850, 'cat459': 1211178, 'cat460': 1383480, 'cat461': 910506, 'cat462': 1247875, 'cat463': 1167996, 'cat464': 693344, 'cat465': 1267458, 'cat466': 540858, 'cat467': 1208794, 'cat468': 1661927, 'cat469': 1461902, 'cat470': 1423707, 'cat471': 1974615, 'cat472': 537920, 'cat473': 1306028, 'cat474': 1195201, 'cat475': 1585137, 'cat476': 1808469, 'cat477': 1119612, 'cat478': 1574291, 'cat479': 634827, 'cat480': 736663, 'cat481': 979312, 'cat482': 719738, 'cat483': 676289, 'cat484': 1056928, 'cat485': 1070258, 'cat486': 583022, 'cat487': 880740, 'cat488': 1067166, 'cat489': 771697, 'cat490': 1385531, 'cat491': 1917619, 'cat492': 1042342, 'cat493': 1351334, 'cat494': 813247, 'cat495': 1625329, 'cat496': 1579577, 'cat497': 1696624, 'cat498': 1537276, 'cat499': 1968881, 'cat500': 1185870, 'cat501': 687615, 'cat502': 1085236, 'cat503': 620641, 'cat504': 1943271, 'cat505': 884500, 'cat506': 1391954, 'cat507': 651863, 'cat508': 1063973, 'cat509': 535298, 'cat510': 1830516, 'cat511': 685736, 'cat512': 1046416, 'cat513': 675620, 'cat514': 1775440, 'cat515': 966423, 'cat516': 639717, 'cat517': 1054593, 'cat518': 755176, 'cat519': 1451632, 'cat520': 524214, 'cat521': 1211252, 'cat522': 1659859, 'cat523': 1376106, 'cat524': 1061742, 'cat525': 1803806, 'cat526': 771004, 'cat527': 590608, 'cat528': 1605020, 'cat529': 1988006, 'cat530': 1000036, 'cat531': 729536, 'cat532': 838583, 'cat533': 1049234, 'cat534': 605653, 'cat535': 879890, 'cat536': 923138, 'cat537': 1154295, 'cat538': 1818418, 'cat539': 1139642, 'cat540': 1613767, 'cat541': 931743, 'cat542': 1108091, 'cat543': 1434673, 'cat544': 1548761, 'cat545': 1909615, 'cat546': 873083, 'cat547': 1067326, 'cat548': 1227713, 'cat549': 538090, 'cat550': 1025229, 'cat551': 577488, 'cat552': 532183, 'cat553': 538658, 'cat554': 1560433, 'cat555': 1655633, 'cat556': 897319, 'cat557': 1578428, 'cat558': 1495645, 'cat559': 1015227, 'cat560': 1437543, 'cat561': 722888, 'cat562': 1880597, 'cat563': 1863371, 'cat564': 1406342, 'cat565': 1876800, 'cat566': 1538093, 'cat567': 1644848, 'cat568': 1324361, 'cat569': 1562597, 'cat570': 1145467, 'cat571': 1942298, 'cat572': 951267, 'cat573': 981435, 'cat574': 1218703, 'cat575': 916545, 'cat576': 1982110, 'cat577': 1833740, 'cat578': 793011, 'cat579': 1348712, 'cat580': 1228869, 'cat581': 614061, 'cat582': 772249, 'cat583': 529895, 'cat584': 648316, 'cat585': 1811660, 'cat586': 1036019, 'cat587': 1403328, 'cat588': 842352, 'cat589': 616184, 'cat590': 677177, 'cat591': 1895083, 'cat592': 1298766, 'cat593': 1561039, 'cat594': 1906231, 'cat595': 1091256, 'cat596': 1755729, 'cat597': 1007957, 'cat598': 1952666, 'cat599': 1114588, 'cat600': 594869, 'cat601': 1463542, 'cat602': 888711, 'cat603': 830370, 'cat604': 1064210, 'cat605': 1434961, 'cat606': 507597, 'cat607': 1052060, 'cat608': 1263659, 'cat609': 1189808, 'cat610': 1647296, 'cat611': 1178499, 'cat612': 1012641, 'cat613': 572240, 'cat614': 1149169, 'cat615': 956897, 'cat616': 1247810, 'cat617': 883690, 'cat618': 502241, 'cat619': 1203243, 'cat620': 1300329, 'cat621': 675931, 'cat622': 1495399, 'cat623': 1084956, 'cat624': 1554372, 'cat625': 1875769, 'cat626': 921485, 'cat627': 1020469, 'cat628': 1558507, 'cat629': 510382, 'cat630': 690529, 'cat631': 1054001, 'cat632': 688226, 'cat633': 801707, 'cat634': 1337835, 'cat635': 1730610, 'cat636': 587381, 'cat637': 1326233, 'cat638': 547173, 'cat639': 1128403, 'cat640': 1138047, 'cat641': 1820513, 'cat642': 988237, 'cat643': 677173, 'cat644': 1728057, 'cat645': 1609790, 'cat646': 825587, 'cat647': 1878969, 'cat648': 1751075, 'cat649': 1316875, 'cat650': 1183954, 'cat651': 1536393, 'cat652': 813446, 'cat653': 1095961, 'cat654': 1797522, 'cat655': 1848928, 'cat656': 803567, 'cat657': 591830, 'cat658': 1999486, 'cat659': 1575799, 'cat660': 1815610, 'cat661': 1400191, 'cat662': 1970215, 'cat663': 1560196, 'cat664': 792149, 'cat665': 1598398, 'cat666': 1557742, 'cat667': 1692187, 'cat668': 533721, 'cat669': 1939635, 'cat670': 1724865, 'cat671': 1991465, 'cat672': 1932135, 'cat673': 1954010, 'cat674': 1848237, 'cat675': 982220, 'cat676': 678450, 'cat677': 565348, 'cat678': 587791, 'cat679': 779116, 'cat680': 1836137, 'cat681': 1256458, 'cat682': 720025, 'cat683': 1289825, 'cat684': 1446625, 'cat685': 1671316, 'cat686': 606494, 'cat687': 1816523, 'cat688': 539511, 'cat689': 1813292, 'cat690': 1614518, 'cat691': 1927456, 'cat692': 1012878, 'cat693': 1526124, 'cat694': 1053213, 'cat695': 506950, 'cat696': 1458290, 'cat697': 647034, 'cat698': 1554806, 'cat699': 1622395, 'cat700': 692816, 'cat701': 1882651, 'cat702': 1603081, 'cat703': 638517, 'cat704': 1493753, 'cat705': 1028888, 'cat706': 656132, 'cat707': 1056915, 'cat708': 992381, 'cat709': 930372, 'cat710': 983888, 'cat711': 1863006, 'cat712': 1465403, 'cat713': 1535885, 'cat714': 1302287, 'cat715': 660935, 'cat716': 1504557, 'cat717': 1933815, 'cat718': 1102551, 'cat719': 598036, 'cat720': 1793888, 'cat721': 1827062, 'cat722': 1847971, 'cat723': 915844, 'cat724': 662470, 'cat725': 1757673, 'cat726': 809172, 'cat727': 1195778, 'cat728': 1032551, 'cat729': 1866366, 'cat730': 1953088, 'cat731': 1138409, 'cat732': 1802647, 'cat733': 1690682, 'cat734': 779846, 'cat735': 526149, 'cat736': 1511709, 'cat737': 627215, 'cat738': 1518792, 'cat739': 1063657, 'cat740': 1909289, 'cat741': 708706, 'cat742': 1951617, 'cat743': 956536, 'cat744': 1917061, 'cat745': 1526795, 'cat746': 1109970, 'cat747': 1986610, 'cat748': 1583253, 'cat749': 1098828, 'cat750': 1474468, 'cat751': 1477058, 'cat752': 1477984, 'cat753': 748518, 'cat754': 1651496, 'cat755': 917857, 'cat756': 1153629, 'cat757': 680049, 'cat758': 1491837, 'cat759': 536709, 'cat760': 1107311, 'cat761': 1462531, 'cat762': 660357, 'cat763': 1562456, 'cat764': 1442566, 'cat765': 1063415, 'cat766': 1311279, 'cat767': 940060, 'cat768': 941888, 'cat769': 656474, 'cat770': 1719435, 'cat771': 689379, 'cat772': 797251, 'cat773': 1599045, 'cat774': 1049053, 'cat775': 1254038, 'cat776': 778092, 'cat777': 1765349, 'cat778': 1824704, 'cat779': 1566914, 'cat780': 1086296, 'cat781': 736301, 'cat782': 1975004, 'cat783': 1265855, 'cat784': 985246, 'cat785': 1544146, 'cat786': 1519510, 'cat787': 1326446, 'cat788': 552081, 'cat789': 833584, 'cat790': 507529, 'cat791': 1531161, 'cat792': 1929393, 'cat793': 1445312, 'cat794': 1350225, 'cat795': 1133237, 'cat796': 795084, 'cat797': 1372795, 'cat798': 1221336, 'cat799': 1288751, 'cat800': 1162863, 'cat801': 753565, 'cat802': 1194837, 'cat803': 503651, 'cat804': 1180625, 'cat805': 1209409, 'cat806': 1335210, 'cat807': 751744, 'cat808': 910498, 'cat809': 1995318, 'cat810': 524582, 'cat811': 1107823, 'cat812': 1031024, 'cat813': 1280606, 'cat814': 636267, 'cat815': 1323969, 'cat816': 1318227, 'cat817': 1735593, 'cat818': 660223, 'cat819': 1256462, 'cat820': 1397691, 'cat821': 1077042, 'cat822': 601225, 'cat823': 1088539, 'cat824': 713301, 'cat825': 608249, 'cat826': 1888268, 'cat827': 1098995, 'cat828': 1831614, 'cat829': 812296, 'cat830': 1022871, 'cat831': 1057272, 'cat832': 1414863, 'cat833': 1571567, 'cat834': 1161864, 'cat835': 898142, 'cat836': 1282970, 'cat837': 1397051, 'cat838': 560841, 'cat839': 1823084, 'cat840': 1338948, 'cat841': 1662143, 'cat842': 1651814, 'cat843': 926635, 'cat844': 668982, 'cat845': 603758, 'cat846': 1361690, 'cat847': 1445522, 'cat848': 1789568, 'cat849': 790607, 'cat850': 1851594, 'cat851': 1100222, 'cat852': 1518324, 'cat853': 602712, 'cat854': 1653660, 'cat855': 766991, 'cat856': 858115, 'cat857': 1490241, 'cat858': 1370038, 'cat859': 1220713, 'cat860': 1090864, 'cat861': 1124472, 'cat862': 1036331, 'cat863': 1869058, 'cat864': 1045614, 'cat865': 1351882, 'cat866': 1875721, 'cat867': 1000516, 'cat868': 1130899, 'cat869': 1513306, 'cat870': 1668788, 'cat871': 1902734, 'cat872': 1327049, 'cat873': 751118, 'cat874': 850921, 'cat875': 1848898, 'cat876': 839019, 'cat877': 657644, 'cat878': 935940, 'cat879': 1549844, 'cat880': 1542443, 'cat881': 1654244, 'cat882': 961426, 'cat883': 1449980, 'cat884': 1198004, 'cat885': 1443635, 'cat886': 1396371, 'cat887': 792754, 'cat888': 1648789, 'cat889': 903506, 'cat890': 1011884, 'cat891': 690242, 'cat892': 866362, 'cat893': 1217132, 'cat894': 1665752, 'cat895': 691038, 'cat896': 1169594, 'cat897': 1001484, 'cat898': 1272392, 'cat899': 1041815, 'cat900': 1694574, 'cat901': 923922, 'cat902': 542114, 'cat903': 1365665, 'cat904': 1302869, 'cat905': 1367976, 'cat906': 1599260, 'cat907': 940412, 'cat908': 1290344, 'cat909': 1066734, 'cat910': 1209263, 'cat911': 630148, 'cat912': 1544687, 'cat913': 1081993, 'cat914': 1704354, 'cat915': 1255279, 'cat916': 763976, 'cat917': 1940225, 'cat918': 1555697, 'cat919': 1609866, 'cat920': 1820423, 'cat921': 952907, 'cat922': 694192, 'cat923': 1068371, 'cat924': 1021045, 'cat925': 1306482, 'cat926': 1338351, 'cat927': 1854323, 'cat928': 1435032, 'cat929': 1405627, 'cat930': 1154345, 'cat931': 545739, 'cat932': 766856, 'cat933': 567619, 'cat934': 1391708, 'cat935': 1987954, 'cat936': 1492515, 'cat937': 1731398, 'cat938': 1527236, 'cat939': 500374, 'cat940': 653381, 'cat941': 1321079, 'cat942': 1607005, 'cat943': 1481784, 'cat944': 1441517, 'cat945': 1021069, 'cat946': 728686, 'cat947': 969343, 'cat948': 823754, 'cat949': 818910, 'cat950': 1595480, 'cat951': 1930415, 'cat952': 728359, 'cat953': 1970111, 'cat954': 1857587, 'cat955': 1459080, 'cat956': 678265, 'cat957': 1656581, 'cat958': 582934, 'cat959': 502864, 'cat960': 763510, 'cat961': 987748, 'cat962': 1694081, 'cat963': 578835, 'cat964': 1853723, 'cat965': 1999509, 'cat966': 1137077, 'cat967': 768365, 'cat968': 1813808, 'cat969': 1028050, 'cat970': 1607827, 'cat971': 1834399, 'cat972': 1417358, 'cat973': 1965032, 'cat974': 735159, 'cat975': 708550, 'cat976': 647538, 'cat977': 1129878, 'cat978': 1599823, 'cat979': 1722410, 'cat980': 902027, 'cat981': 1313866, 'cat982': 1047108, 'cat983': 968887, 'cat984': 1760516, 'cat985': 502415, 'cat986': 521939, 'cat987': 1627169, 'cat988': 1132335, 'cat989': 1466138, 'cat990': 1084274, 'cat991': 1163448, 'cat992': 1851773, 'cat993': 1008260, 'cat994': 1496785, 'cat995': 1603684, 'cat996': 992344, 'cat997': 1647146, 'cat998': 1018118, 'cat999': 561406}], ['entropy']],)
Expected Output: [6.844374138]
Explanation: One block of ~1000 categories totalling ~1.23e9 labels; entropy stable to 1e-6 via compensated accumulation.
Hints
- Reuse Part 1's invariant S = sum_i c_i ln c_i with total n; entropy is still ln(n) - S/n.
- Folding (label, k) in a block is the same delta as a single update generalized: S += (c+k)ln(c+k) - c ln c (only the old term if c>0), and n += k. That's O(1) per (label, k) pair, so O(k) per block — no rescan of other categories.
- Keep the same Neumaier-compensated accumulator for S across both single and block updates so a billion-label total still matches the reference to 1e-6.