How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Anthropic.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Anthropic during technical interviews.

Implement cluster status tracker | Anthropic Coding Question

Quick Overview

This question evaluates a candidate's competence in data structures and algorithms for time-ordered state tracking, time-series indexing, conflict resolution (last-write-wins), memory-efficient storage, and concurrent system behavior under high update rates.

Implement cluster status tracker

Company: Anthropic

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement a cluster status tracker. Design a class with methods: update(nodeId, status, timestamp) to record node status updates that may arrive out of order or be duplicated; getCurrent(nodeId) to return the most recent status; getAt(nodeId, t) to return the status effective at time t; getClusterSummaryAt(t) to return counts of nodes by status at time t, where a node is considered OFFLINE if it has not reported in the last TTL seconds. Use last-write-wins by (timestamp, nodeId) for conflict resolution. Aim for O(log n) per update/query and memory efficiency for up to 100k nodes and 10M updates/day. Describe data structures (e.g., per-node ordered maps, indexes for aggregation), handling of out-of-order events and idempotency, and a strategy to compact/expire historical data. Follow-ups: ( 1) add range queries (e.g., counts per minute over the last K minutes) efficiently; ( 2) make updates and reads thread-safe under high concurrency, discussing sharding and lock strategies.

Quick Answer: This question evaluates a candidate's competence in data structures and algorithms for time-ordered state tracking, time-series indexing, conflict resolution (last-write-wins), memory-efficient storage, and concurrent system behavior under high update rates.

Part 1: Implement the cluster status tracker

Process a sequence of operations on a cluster tracker. Each update is ('update', node_id, status, timestamp). Updates may arrive out of order or be duplicated. If the same node receives multiple updates at the same timestamp, the later one in the input wins. Support these operations: - ('getCurrent', node_id): return the newest stored status for that node. This ignores TTL. - ('getAt', node_id, t): return the status effective at time t. If the node had reported by time t but its latest report is more than ttl seconds old, return 'OFFLINE'. If the node had never reported by time t, return None. - ('getClusterSummaryAt', t): return a dictionary of counts by status at time t, counting only nodes that have reported at least once by time t. - ('compact', cutoff): simulate history compaction. After compaction, queries with time smaller than the latest cutoff are no longer guaranteed and must return None. Queries at or after the latest cutoff must still be answered correctly. Return the answers for query operations in order.

Constraints

1 <= ttl <= 10^9
0 <= node_id, timestamp, t, cutoff <= 10^9
1 <= len(operations) <= 2 * 10^4
Status strings are non-empty and never equal to 'OFFLINE'
A node is counted in a summary at time t only if it has at least one update with timestamp <= t

Examples

Input: (5, [('update', 1, 'OK', 10), ('update', 2, 'WARN', 8), ('update', 1, 'ERROR', 7), ('getCurrent', 1), ('getAt', 1, 8), ('getClusterSummaryAt', 10)])

Expected Output: ['OK', 'ERROR', {'OK': 1, 'WARN': 1}]

Explanation: Node 1's newest update is at time 10 with status OK, but at time 8 its effective status is ERROR.

Input: (3, [('update', 1, 'OK', 5), ('update', 1, 'OK', 5), ('update', 2, 'WARN', 1), ('getAt', 2, 5), ('getClusterSummaryAt', 5)])

Expected Output: ['OFFLINE', {'OFFLINE': 1, 'OK': 1}]

Explanation: The duplicate update for node 1 changes nothing. Node 2 has not reported within TTL by time 5.

Hints

For each node, keep timestamps sorted so you can binary-search the latest update not after t.
During compaction, keeping the last update before the cutoff plus all updates at or after the cutoff is enough to answer future queries for times >= cutoff.

Part 2: Add efficient range queries

You are given the full stream of cluster updates up front and then many range queries. Time is measured in whole minutes. Each event is (node_id, status, minute). Events may be out of order. If the same node has multiple events at the same minute, the later one in the input wins. A node is considered 'OFFLINE' at minute t if its latest report at or before t is more than ttl minutes old. Before a node's first report, it is not counted at all. For each query (end_minute, k), return the cluster summary for every minute in the window [end_minute - k + 1, end_minute], in chronological order. Each summary is a dictionary of counts by status and may include 'OFFLINE'. Because all events are known before answering queries, the intended solution is to preprocess the timeline once and answer all range queries efficiently.

Constraints

1 <= ttl <= 10^5
0 <= minute <= 2 * 10^5
1 <= len(events) + len(queries) <= 2 * 10^5
The number of distinct non-OFFLINE statuses is at most 20
Status strings are non-empty and never equal to 'OFFLINE'
For every query, 0 <= end_minute - k + 1 <= end_minute <= 2 * 10^5

Examples

Input: (2, [(1, 'OK', 1), (2, 'WARN', 2), (1, 'ERROR', 4)], [(4, 4)])

Expected Output: [[{'OK': 1}, {'OK': 1, 'WARN': 1}, {'OK': 1, 'WARN': 1}, {'ERROR': 1, 'WARN': 1}]]

Explanation: The query asks for summaries at minutes 1, 2, 3, and 4.

Input: (1, [(1, 'OK', 3), (1, 'WARN', 3), (2, 'OK', 1)], [(3, 3)])

Expected Output: [[{'OK': 1}, {'OK': 1}, {'OFFLINE': 1, 'WARN': 1}]]

Explanation: At minute 3, node 1 is WARN because later input wins at the same minute, and node 2 is OFFLINE.

Hints

For one node, each update creates a time interval during which that status is active: from its minute until the earlier of TTL expiry or the next update.
Difference arrays plus prefix sums let you build every minute's cluster summary once, then each query becomes a slice of the precomputed timeline.

Part 3: Schedule thread-safe sharded requests

This problem models the sharding and lock strategy for a highly concurrent cluster tracker. There are num_shards shards. Node x belongs to shard x % num_shards. Each request is: - ('R', [node_ids]) for a read - ('W', [node_ids]) for a write A request must lock every distinct shard touched by its node list. To avoid deadlocks, locks must always be acquired in ascending shard order. Requests are assigned to execution waves using this greedy policy: process requests in input order and place each request into the earliest existing wave where it does not conflict; if none exists, start a new wave. Conflict rules inside one wave: - Reads may share a shard with other reads. - A write conflicts with any other request that touches the same shard. Return both the lock order for every request and the wave chosen for it.

Constraints

1 <= num_shards <= 10^4
0 <= node_id <= 10^9
0 <= len(requests) <= 2 * 10^4
Each request mode is either 'R' or 'W'
Each request may list the same node more than once, but a shard should only be locked once per request

Examples

Input: (4, [('R', [1, 5]), ('R', [2]), ('W', [6]), ('W', [1, 2])])

Expected Output: {'waves': 3, 'plan': [[[1], 0], [[2], 0], [[2], 1], [[1, 2], 2]]}

Explanation: The first two reads fit in wave 0. The writes must be separated because they conflict on touched shards.

Input: (3, [('W', [1]), ('W', [2]), ('R', [4])])

Expected Output: {'waves': 2, 'plan': [[[1], 0], [[2], 0], [[1], 1]]}

Explanation: The two writes touch different shards, so they can share wave 0. The final read touches shard 1 and must wait.

Hints

Convert each request to the set of shards it touches first. The deadlock-free lock order is just the sorted list of those shards.
For every wave, track which shards already have readers and which already have writers. Reads only need to avoid writers; writes need both sets to be disjoint.

Quick Overview