Debug round-robin, DashMap, and simple cache
Company: DoorDash
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
You are given a service that routes requests to a list of nodes, each marked as either available or unavailable. The pickNode() function is intended to perform round-robin selection while skipping unavailable nodes, but a failing unit test shows it occasionally returns an unavailable node. Debug and fix the implementation:
(
1) maintain a global (thread-safe) index so selection does not reset per call;
(
2) ensure status checks are robust (e.g., use an enum or constant, not fragile string literals);
(
3) define behavior when all nodes are unavailable; and
(
4) update/add a unit test where one node is unavailable and one is available so the selector repeatedly returns the available node. As a separate debugging task, you are given a buggy custom hash map named DashMap. Diagnose and fix issues around key hashing vs equality, collision handling, resizing/rehashing, and iterator behavior. Finally, implement a very simple in-memory cache backed by a map with get/set and optional TTL or size-based eviction, and write basic tests for it.
Quick Answer: This question evaluates skills in concurrent programming and stateful selection logic (round-robin load balancing), data structure invariants and debugging (custom hash map hashing vs equality, collision resolution, resizing and iterator correctness), simple in-memory caching strategies with TTL or size-based eviction, and unit test design.
Part 1: Round-Robin Available Node Selector
Implement a **round-robin selector** over a fixed list of service nodes, where each node can be toggled available or unavailable while you keep handing out the next available node in circular order.
## What to implement
```python
def solution(nodes, operations):
...
```
- **`nodes`** — a list of `(node_id, status)` tuples describing the nodes, in their fixed order. `node_id` is a unique string; `status` is `1` (**available**) or `0` (**unavailable**).
- **`operations`** — a list of operation tuples to process **in the given order**.
Return a **list** containing one result for **each `pick` operation only**, in the order those picks were processed.
## Operations
Process operations one at a time. There are two kinds:
- **`('pick',)`** — Select and return the **next available node**, then record its result in the output list. Selection uses a **persistent round-robin pointer** (described below).
- **`('set', node_id, status)`** — Update that node's availability to `status` (`1` = available, `0` = unavailable). This operation produces **no output**.
Any empty/falsy operation in the list is ignored.
## Round-robin pointer rules
- The pointer is an **index into the node list** and starts at index `0`.
- A `pick` returns the node at the **smallest index `>=` the current pointer that is currently available**. If no available node exists at or after the pointer, the search **wraps around** and continues from index `0`, returning the first available node before the pointer.
- After a successful pick of the node at index `idx`, advance the pointer to `(idx + 1) % len(nodes)`, so the next `pick` continues from just after the node just chosen.
- The pointer **persists across picks** (and is unaffected by `set` operations except through availability changes).
## Return values for a pick
- On success, append the chosen node's **`node_id`** (a string).
- If **no node is currently available** (every node has status `0`, or there are no nodes at all), append **`-1`** (the integer). In this case the pointer is left unchanged.
## Notes & edge cases
- Status is **numeric** (`0`/`1`); compare against these numeric constants rather than string-based checks.
- If `nodes` is empty, every `pick` returns `-1`.
- A `set` may change a node's status to a value it already has; this is a valid no-op update.
## Examples
- `nodes = [('A', 0), ('B', 1)]`, `operations = [('pick',), ('pick',), ('pick',)]` → `['B', 'B', 'B']` (A is unavailable, so every pick returns B).
- `nodes = [('A', 1), ('B', 1), ('C', 1)]`, `operations = [('pick',), ('pick',), ('set', 'B', 0), ('pick',), ('pick',), ('set', 'A', 0), ('set', 'C', 0), ('pick',)]` → `['A', 'B', 'C', 'A', -1]`.
- `nodes = []`, `operations = [('pick',), ('pick',)]` → `[-1, -1]`.
- `nodes = [('A', 1), ('B', 0), ('C', 1)]`, `operations = [('pick',), ('pick',), ('pick',)]` → `['A', 'C', 'A']` (B is skipped; after C the pointer wraps back to A).
## Constraints
- `0 <= len(nodes) <= 200000`
- `0 <= len(operations) <= 200000`
- Each `node_id` is unique.
- Each `status` is either `0` or `1`.
Constraints
- 0 <= len(nodes) <= 200000
- 0 <= len(operations) <= 200000
- Each node_id is unique
- Each status is either 0 or 1
Examples
Input: ([('A', 0), ('B', 1)], [('pick',), ('pick',), ('pick',)])
Expected Output: ['B', 'B', 'B']
Explanation: Only B is available, so every pick returns B.
Input: ([('A', 1), ('B', 1), ('C', 1)], [('pick',), ('pick',), ('set', 'B', 0), ('pick',), ('pick',), ('set', 'A', 0), ('set', 'C', 0), ('pick',)])
Expected Output: ['A', 'B', 'C', 'A', -1]
Explanation: The pointer keeps moving forward across picks. After all nodes become unavailable, pick returns -1.
Input: ([], [('pick',), ('pick',)])
Expected Output: [-1, -1]
Explanation: With no nodes at all, every pick fails.
Input: ([('A', 1), ('B', 0), ('C', 1)], [('pick',), ('pick',), ('pick',)])
Expected Output: ['A', 'C', 'A']
Explanation: The selector wraps around and keeps skipping the unavailable node B.
Solution
def solution(nodes, operations):
n = len(nodes)
if n == 0:
return [-1 for op in operations if op and op[0] == 'pick']
ids = []
id_to_index = {}
status = [0] * n
for i, (node_id, state) in enumerate(nodes):
ids.append(node_id)
id_to_index[node_id] = i
status[i] = 1 if state == 1 else 0
size = 1
while size < n:
size *= 2
tree = [0] * (2 * size)
for i, val in enumerate(status):
tree[size + i] = val
for i in range(size - 1, 0, -1):
tree[i] = tree[2 * i] + tree[2 * i + 1]
def update(idx, val):
pos = size + idx
tree[pos] = val
pos //= 2
while pos:
tree[pos] = tree[2 * pos] + tree[2 * pos + 1]
pos //= 2
def range_sum(left, right):
if left > right:
return 0
left += size
right += size
total = 0
while left <= right:
if left % 2 == 1:
total += tree[left]
left += 1
if right % 2 == 0:
total += tree[right]
right -= 1
left //= 2
right //= 2
return total
def find_first(left, right):
if left > right:
return -1
def dfs(node, node_left, node_right):
if node_right < left or node_left > right or tree[node] == 0:
return -1
if node_left == node_right:
return node_left
mid = (node_left + node_right) // 2
res = dfs(node * 2, node_left, mid)
if res != -1:
return res
return dfs(node * 2 + 1, mid + 1, node_right)
res = dfs(1, 0, size - 1)
return res if 0 <= res < n else -1
pointer = 0
answer = []
for op in operations:
if not op:
continue
kind = op[0]
if kind == 'set':
node_id, new_state = op[1], op[2]
idx = id_to_index[node_id]
new_val = 1 if new_state == 1 else 0
if status[idx] != new_val:
status[idx] = new_val
update(idx, new_val)
elif kind == 'pick':
if tree[1] == 0:
answer.append(-1)
continue
if range_sum(pointer, n - 1) > 0:
idx = find_first(pointer, n - 1)
else:
idx = find_first(0, pointer - 1)
answer.append(ids[idx])
pointer = (idx + 1) % n
return answerExplanation
The selector must repeatedly find the **next available node** from a persistent round-robin pointer while supporting status flips, all in logarithmic time. A linear scan per `pick` would be O(n) worst case, so the solution uses a **segment tree of availability counts**.
**Setup.** Each node maps to an index via `id_to_index`; `status[i]` is `1` (available) or `0`. The tree is sized to the next power of two ≥ `n`; leaf `size+i` holds `status[i]`, and every internal node stores the **sum** of its children — i.e. how many available nodes live in its subtree. `tree[1]` is the global available count.
**Operations.**
- `set node_id status`: only touches the tree when the value actually changes, calling `update` to rewrite one leaf and bubble new sums to the root — O(log n).
- `pick`: if `tree[1] == 0`, no node is available, so it appends `-1`. Otherwise it looks forward first. `range_sum(pointer, n-1)` (an iterative bottom-up sum) tests whether anything is available from the pointer onward; if so, `find_first(pointer, n-1)` descends the tree, pruning any subtree whose sum is `0`, to return the **leftmost** available index ≥ pointer. If the forward range is empty, it wraps and searches `[0, pointer-1]`. The chosen node's id is recorded and the pointer advances to `(idx+1) % n`, so the next pick continues round-robin.
**Why it's correct.** `find_first` always returns the smallest index in range with a positive subtree sum, which is exactly the next available node in scan order; the forward-then-wrap split reproduces circular traversal; pointer state persists across picks. Empty `nodes` short-circuits to `-1`s.
Time complexity: O((n + q) log n), where n = number of nodes and q = number of operations. Building the tree is O(n); each `pick` (range_sum + find_first) and each `set` (update) costs O(log n).. Space complexity: O(n) — the segment tree uses 2*size ≤ 4n slots, plus the ids list, status array, and id-to-index map, all O(n)..
Hints
- Keep one pointer that survives across all pick operations instead of restarting at index 0 each time.
- A segment tree or other indexed structure can help find the next available node quickly after updates.
Part 2: Implement and Repair DashMap
Implement a custom hash map called **DashMap** from scratch using **separate chaining**, then replay a sequence of operations against it and return the results.
Write a function with this signature:
```python
def solution(initial_capacity, operations):
...
```
## What to build
DashMap stores **string keys** mapped to **integer values**. Build it on top of an array of buckets, where each bucket is a list that holds every `(key, value)` pair whose key hashes to that slot.
**Hash function (use exactly this).** For the current table `capacity`, the slot for a key is:
```
hash(key) = sum(ord(c) for c in key) % capacity
```
This deterministic hash makes collisions easy to produce. Because distinct keys can share the same slot, you must compare keys with **normal string equality** (`k == key`) when searching a bucket, so two different keys with the same hash still coexist correctly.
**Capacity floor.** Treat the working capacity as at least `2` (i.e. `max(2, initial_capacity)`), so the table is never empty and you never take a modulus by `0`.
## Inputs
- `initial_capacity` — a non-negative integer, the requested starting number of buckets (floored to a minimum of `2` as described above).
- `operations` — a list of operation tuples to apply **in order**. Each operation is one of:
- `('put', key, value)` — insert or update `key` with `value`.
- `('get', key)` — look up `key`.
- `('remove', key)` — delete `key`.
- `('items',)` — snapshot all current entries.
## Operation semantics
- **`('put', key, value)`** — If `key` already exists, overwrite its value in place (the size does not change). Otherwise, add a new `(key, value)` entry and increase the size. This operation produces **no value** in the result list.
- **`('get', key)`** — Return the value mapped to `key`, or `None` if `key` is not present.
- **`('remove', key)`** — If `key` exists, delete it and return `True`. If `key` is not present, return `False`.
- **`('items',)`** — Return a list of all current live `(key, value)` pairs, each appearing **exactly once**, **sorted by key** in ascending order for deterministic output.
## Resizing
After a **new** key is inserted by a `put` (not on an update of an existing key), check the load factor. Whenever
```
size > capacity * 0.75
```
(strictly greater than `0.75`), **double the capacity** and **rehash** every existing entry into the larger table using the new capacity. Resizing only ever happens on growth from a `put`; `get`, `remove`, and `items` never trigger it.
## Return value
Return a single list containing the result of each operation that produces an output, in the order those operations were applied. Concretely, append one entry per `get` (its value or `None`), one entry per `remove` (`True`/`False`), and one entry per `items` (the sorted list of pairs). `put` operations contribute nothing to this list.
## Example
For `initial_capacity = 2` and operations:
```
[('put', 'ab', 1), ('put', 'ba', 2), ('get', 'ab'), ('get', 'ba'), ('items',)]
```
`'ab'` and `'ba'` hash to the same slot but must coexist. The function returns:
```
[1, 2, [('ab', 1), ('ba', 2)]]
```
## Constraints
- `0 <= initial_capacity <= 10000`
- `0 <= len(operations) <= 20000`
- Keys are strings of length `0` to `100`.
- Values are integers.
Constraints
- 0 <= initial_capacity <= 10000
- 0 <= len(operations) <= 20000
- Keys are strings of length 0 to 100
- Values are integers
Examples
Input: (2, [('put', 'ab', 1), ('put', 'ba', 2), ('get', 'ab'), ('get', 'ba'), ('items',)])
Expected Output: [1, 2, [('ab', 1), ('ba', 2)]]
Explanation: 'ab' and 'ba' have the same hash under the given function, so this checks collision handling.
Input: (4, [('put', 'aa', 1), ('put', 'aa', 5), ('get', 'aa'), ('items',)])
Expected Output: [5, [('aa', 5)]]
Explanation: Inserting the same key again should update the existing entry, not create a duplicate.
Input: (2, [('put', 'a', 1), ('put', 'b', 2), ('put', 'c', 3), ('put', 'd', 4), ('get', 'c'), ('remove', 'b'), ('get', 'b'), ('items',)])
Expected Output: [3, True, None, [('a', 1), ('c', 3), ('d', 4)]]
Explanation: This forces resizing and then verifies that all remaining entries are still retrievable and iterable.
Input: (4, [('get', 'x'), ('remove', 'x'), ('items',)])
Expected Output: [None, False, []]
Explanation: Edge case with an empty map.
Solution
def solution(initial_capacity, operations):
capacity = max(2, initial_capacity)
buckets = [[] for _ in range(capacity)]
size = 0
def hash_key(key, mod):
total = 0
for ch in key:
total += ord(ch)
return total % mod
def find_in_bucket(bucket, key):
for i, (k, v) in enumerate(bucket):
if k == key:
return i
return -1
def rehash(new_capacity):
nonlocal buckets, capacity
old_buckets = buckets
capacity = max(2, new_capacity)
buckets = [[] for _ in range(capacity)]
for bucket in old_buckets:
for k, v in bucket:
idx = hash_key(k, capacity)
buckets[idx].append((k, v))
answer = []
for op in operations:
if not op:
continue
kind = op[0]
if kind == 'put':
key, value = op[1], op[2]
idx = hash_key(key, capacity)
bucket = buckets[idx]
pos = find_in_bucket(bucket, key)
if pos != -1:
bucket[pos] = (key, value)
else:
bucket.append((key, value))
size += 1
if size > capacity * 0.75:
rehash(capacity * 2)
elif kind == 'get':
key = op[1]
idx = hash_key(key, capacity)
bucket = buckets[idx]
pos = find_in_bucket(bucket, key)
answer.append(bucket[pos][1] if pos != -1 else None)
elif kind == 'remove':
key = op[1]
idx = hash_key(key, capacity)
bucket = buckets[idx]
pos = find_in_bucket(bucket, key)
if pos == -1:
answer.append(False)
else:
bucket.pop(pos)
size -= 1
answer.append(True)
elif kind == 'items':
items = []
for bucket in buckets:
for k, v in bucket:
items.append((k, v))
items.sort(key=lambda pair: pair[0])
answer.append(items)
return answerExplanation
This builds a hash map from scratch using **separate chaining**: `buckets` is a list of lists, where each bucket holds `(key, value)` tuples that collided to the same slot.
**Hashing.** `hash_key(key, mod)` sums the `ord` of every character and takes it mod the current `mod`. Capacity is floored at 2 (`max(2, initial_capacity)`) so the table is never empty and never `% 0`.
**Per-operation logic.** Each op is dispatched by `op[0]`:
- `put` — compute the bucket, then `find_in_bucket` does a linear scan. If the key already lives in the bucket, the tuple is overwritten in place (no size change); otherwise it's appended and `size += 1`. Because distinct keys can share a hash, equality is checked by `k == key`, so collisions coexist correctly.
- `get` — scan the bucket and append the value, or `None` if absent.
- `remove` — scan, `pop` the entry and append `True`, or append `False` if the key isn't there.
- `items` — gather every live pair from all buckets and `sort` by key for deterministic output.
**Resizing.** Right after a *new* insertion, if `size > capacity * 0.75` the table doubles via `rehash`: it allocates fresh buckets at the new capacity and re-inserts every existing entry using the new modulus. This keeps the load factor low so chains stay short and average lookups stay O(1). `rehash` is only triggered on growth (puts), matching the spec.
The approach is correct because keys are matched by real string equality inside each chain, hash collisions never lose data, and rehashing preserves all entries under the new modulus.
Time complexity: Average O(1) per put/get/remove; O(n log n) per items call. Space complexity: O(n).
Hints
- A matching hash is not enough; inside a bucket you still need to compare actual keys for equality.
- When capacity changes, bucket indices change too, so every existing entry must be inserted again using the new modulus.
Part 3: Tiny In-Memory Cache with TTL and LRU Eviction
Implement a fixed-capacity **in-memory cache** that supports **per-entry TTL (time-to-live) expiration** and **LRU (least-recently-used) eviction**.
Implement the function:
```python
def solution(capacity, operations):
```
- `capacity` is a non-negative integer: the maximum number of live entries the cache may hold.
- `operations` is a list of operation tuples to process **in order**. Each tuple's first element is its kind (`'set'`, `'get'`, or `'keys'`), and every operation carries an integer timestamp. **Timestamps are non-decreasing** across the operation list.
Return a **list** containing one result for each `'get'` and `'keys'` operation, in the order those operations occur. `'set'` operations produce **no** output.
## Recency model
The cache tracks how recently each live key was used, from **least recently used (LRU)** to **most recently used (MRU)**:
- A `'set'` makes its key the **most recently used** entry.
- A **successful** `'get'` (the key is still live) also makes that key the **most recently used** entry.
## Expiration (applied before every operation)
Before processing **any** operation at timestamp `now`, first **remove all expired entries**.
- An entry created by `('set', key, value, ttl, time)` with `ttl is not None` expires at `time + ttl`. It is considered **expired (unavailable) at that exact instant** — i.e. an entry is removed once `now >= expiry_time`.
- An entry whose `ttl` is `None` **never expires**.
## Operations
**`('set', key, value, ttl, time)`** — Store (or overwrite) `key`.
- `ttl` is either `None` (no expiration) or a **positive integer** (`>= 1`); if not `None`, the entry expires at `time + ttl`.
- Storing or overwriting a key marks it as **most recently used**.
- After the insert, if the number of live entries **exceeds** `capacity`, **evict the least recently used live entry**.
- **Special case:** if `capacity <= 0`, the cache holds nothing — the key is **not** stored, and if it already existed it is removed.
- A `'set'` appends nothing to the output list.
**`('get', key, time)`** — Look up `key`.
- If the key is still live, append its **value** to the output and mark the key as **most recently used**.
- Otherwise, append `None`.
**`('keys', time)`** — Append a **list of the currently live keys**, ordered from **least recently used to most recently used**.
## Constraints
- `0 <= capacity <= 100000`
- `0 <= len(operations) <= 100000`
- Operation timestamps are non-decreasing.
- `ttl` is either `None` or an integer `>= 1`.
## Examples
Given `capacity = 2` and operations:
```
('set', 'a', 1, None, 0)
('set', 'b', 2, None, 1)
('get', 'a', 2) -> 1 (a becomes most recently used; order is now b, a)
('set', 'c', 3, None, 3) -> evicts b (the LRU live entry)
('get', 'b', 4) -> None (b was evicted)
('keys', 4) -> ['a', 'c']
```
Returns `[1, None, ['a', 'c']]`.
Given `capacity = 2`:
```
('set', 'x', 10, 2, 0) -> x expires at time 2
('get', 'x', 1) -> 10
('get', 'x', 2) -> None (expired exactly at time 2)
('keys', 2) -> []
```
Returns `[10, None, []]`.
Constraints
- 0 <= capacity <= 100000
- 0 <= len(operations) <= 100000
- Operation timestamps are non-decreasing
- ttl is either None or an integer >= 1
Examples
Input: (2, [('set', 'a', 1, None, 0), ('set', 'b', 2, None, 1), ('get', 'a', 2), ('set', 'c', 3, None, 3), ('get', 'b', 4), ('keys', 4)])
Expected Output: [1, None, ['a', 'c']]
Explanation: After 'get a', A becomes most recently used, so inserting C evicts B.
Input: (2, [('set', 'x', 10, 2, 0), ('get', 'x', 1), ('get', 'x', 2), ('keys', 2)])
Expected Output: [10, None, []]
Explanation: The entry for X expires exactly at time 2.
Input: (2, [('set', 'a', 1, 5, 0), ('set', 'b', 2, None, 1), ('set', 'a', 7, None, 2), ('get', 'a', 6), ('keys', 6)])
Expected Output: [7, ['b', 'a']]
Explanation: Overwriting A removes the old TTL logically and keeps A as the most recently used key.
Input: (0, [('set', 'a', 1, None, 0), ('get', 'a', 1), ('keys', 1)])
Expected Output: [None, []]
Explanation: With zero capacity, nothing can be stored.
Solution
def solution(capacity, operations):
from collections import OrderedDict
import heapq
data = {}
order = OrderedDict()
expiry_heap = []
version = 0
answer = []
def cleanup(now):
while expiry_heap and expiry_heap[0][0] <= now:
exp_time, exp_version, key = heapq.heappop(expiry_heap)
current = data.get(key)
if current is None:
continue
value, current_exp, current_version = current
if current_exp == exp_time and current_version == exp_version and current_exp is not None and current_exp <= now:
del data[key]
if key in order:
del order[key]
for op in operations:
if not op:
continue
kind = op[0]
if kind == 'set':
key, value, ttl, now = op[1], op[2], op[3], op[4]
cleanup(now)
if capacity <= 0:
if key in data:
del data[key]
if key in order:
del order[key]
continue
version += 1
exp_time = None if ttl is None else now + ttl
data[key] = (value, exp_time, version)
order[key] = None
order.move_to_end(key)
if exp_time is not None:
heapq.heappush(expiry_heap, (exp_time, version, key))
if len(data) > capacity:
old_key, _ = order.popitem(last=False)
del data[old_key]
elif kind == 'get':
key, now = op[1], op[2]
cleanup(now)
if key not in data:
answer.append(None)
else:
value, exp_time, current_version = data[key]
order.move_to_end(key)
answer.append(value)
elif kind == 'keys':
now = op[1]
cleanup(now)
answer.append(list(order.keys()))
return answerExplanation
This is a fixed-capacity cache that combines **per-entry TTL expiry** with **LRU eviction**, using lazy (deferred) expiration.
**Data structures**
- `data[key] = (value, exp_time, version)` — the live store. `exp_time` is `None` (never expires) or `now + ttl`.
- `order` (an `OrderedDict`) tracks recency only: front = least recently used, back = most recently used.
- `expiry_heap` — a min-heap of `(exp_time, version, key)` so the soonest expiry is found in O(log n).
- `version` — a monotonic counter that stamps each `set`, used to invalidate stale heap entries.
**Per-operation flow.** Every op first calls `cleanup(now)`, which pops heap entries with `exp_time <= now` (entries expire *at* their expiry instant). For each popped entry it re-checks `data[key]`: it only deletes if the stored `exp_time` and `version` still match the popped entry. That guard is the crux of correctness — if a key was overwritten (e.g. given a longer/no TTL), the old heap entry carries a stale `version` and is harmlessly skipped, so a reset key isn't wrongly evicted.
- **set**: cleanup, then (if `capacity <= 0`) just drop the key. Otherwise bump `version`, store the entry, `move_to_end` to mark it MRU, push a heap entry if it has a TTL, and if `len(data) > capacity` evict the front (LRU) of `order`.
- **get**: cleanup; return `value` and `move_to_end` (counts as a use) if live, else `None`.
- **keys**: cleanup, then return `list(order.keys())` in LRU→MRU order.
`data` and `order` stay in lockstep, so the answer always reflects live, correctly-ordered entries.
Time complexity: O(log n) amortized per operation (heap push/pop dominate; dict and OrderedDict ops are O(1)); O(m log n) total for m operations.. Space complexity: O(n + t), where n is live keys and t is pending TTL entries still in the heap (stale entries are purged lazily during cleanup)..
Hints
- Use a hash map for fast lookup and another structure to track recency order for LRU eviction.
- For TTL, a min-heap of expiration times works well if you attach a version number so stale heap entries can be ignored.