Design an IP filter using CIDR rules
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of IP addressing and CIDR notation, bit-level conversion between prefixes and integer ranges, and the design of scalable data structures and algorithms for prefix/range queries, rule priority handling, and rule-set operations (allow/deny semantics).
Part 1: Dynamic IPv4 firewall with CIDR matching
Constraints
- 1 <= len(commands) <= 200000
- 0 <= priority <= 10^9
- All IP addresses are valid IPv4 addresses and all CIDR prefixes satisfy 0 <= p <= 32
- REMOVE on a non-existent or already removed rule does nothing
Examples
Input: ("priority", ["ADD 10.0.0.0/8 allow 5", "ADD 10.1.0.0/16 deny 10", "QUERY 10.1.2.3", "ADD 10.1.2.0/24 deny 1", "QUERY 10.1.2.3", "REMOVE 3", "QUERY 10.1.2.3"])
Expected Output: ["allow", "deny", "allow"]
Explanation: In priority mode, the smallest priority wins among all matching prefixes. After removing rule 3, the /8 allow rule wins again.
Input: ("lpm", ["ADD 0.0.0.0/0 deny 100", "ADD 192.168.1.0/24 allow 50", "ADD 192.168.1.128/25 deny 1", "QUERY 192.168.1.10", "QUERY 192.168.1.200", "QUERY 8.8.8.8"])
Expected Output: ["allow", "deny", "deny"]
Explanation: In longest-prefix-match mode, the deepest matching prefix wins before priority is considered.
Input: ("priority", ["QUERY 1.1.1.1", "REMOVE 1", "ADD 255.255.255.255/32 allow 7", "QUERY 255.255.255.255", "QUERY 255.255.255.254"])
Expected Output: ["none", "allow", "none"]
Explanation: Edge case: querying with no active rules returns none. A /32 rule matches exactly one address.
Input: ("priority", ["ADD 1.1.1.0/24 deny 3", "ADD 1.1.1.0/24 allow 3", "QUERY 1.1.1.4", "REMOVE 1", "QUERY 1.1.1.4"])
Expected Output: ["deny", "allow"]
Explanation: When both priority and prefix length tie, the smaller rule id wins.
Solution
def solution(mode, commands):
import heapq
def ip_to_int(ip):
a, b, c, d = map(int, ip.split('.'))
return (a << 24) | (b << 16) | (c << 8) | d
def cidr_to_network(cidr):
ip, plen = cidr.split('/')
plen = int(plen)
x = ip_to_int(ip)
if plen == 0:
start = 0
else:
mask = (0xFFFFFFFF << (32 - plen)) & 0xFFFFFFFF
start = x & mask
return start, plen
class Node:
__slots__ = ('child', 'heap')
def __init__(self):
self.child = [None, None]
self.heap = []
root = Node()
active = {}
next_id = 1
def top_active(node):
h = node.heap
while h and h[0][1] not in active:
heapq.heappop(h)
return h[0] if h else None
answers = []
for cmd in commands:
parts = cmd.split()
op = parts[0]
if op == 'ADD':
cidr = parts[1]
action = parts[2]
priority = int(parts[3])
start, plen = cidr_to_network(cidr)
node = root
for i in range(plen):
bit = (start >> (31 - i)) & 1
if node.child[bit] is None:
node.child[bit] = Node()
node = node.child[bit]
heapq.heappush(node.heap, (priority, next_id, action))
active[next_id] = (node, plen, priority, action)
next_id += 1
elif op == 'REMOVE':
rid = int(parts[1])
active.pop(rid, None)
elif op == 'QUERY':
ip = ip_to_int(parts[1])
node = root
if mode == 'priority':
best = top_active(node)
else:
cand = top_active(node)
best = (0, cand) if cand is not None else None
depth = 0
while depth < 32:
bit = (ip >> (31 - depth)) & 1
nxt = node.child[bit]
if nxt is None:
break
node = nxt
depth += 1
cand = top_active(node)
if cand is None:
continue
if mode == 'priority':
if best is None or (cand[0], cand[1]) < (best[0], best[1]):
best = cand
else:
best = (depth, cand)
if mode == 'priority':
answers.append(best[2] if best is not None else 'none')
else:
answers.append(best[1][2] if best is not None else 'none')
return answers
Time complexity: ADD: O(32 + log H), REMOVE: O(1) amortized, QUERY: O(32 + 32 log H), where H is the relevant heap size at visited trie nodes. Since IPv4 has only 32 bits, this is effectively O(log H) per operation with a small constant.. Space complexity: O(T + R), where T is the number of trie nodes created and R is the number of active or lazily removed rules stored in heaps..
Hints
- A binary trie over the 32 bits of an IPv4 address lets you inspect every matching prefix on one root-to-leaf path.
- For removals, lazy deletion works well: mark a rule inactive, and discard it later when it reaches the top of a heap.
Part 2: CIDR block overlap and coverage queries
Constraints
- 1 <= len(commands) <= 20000
- All addresses are valid IPv4 CIDR blocks with 0 <= p <= 32
- REMOVE on a non-existent or already removed rule does nothing
- The intended solution supports fast overlap and coverage queries; CONFLICTS may spend O(A) time scanning active rules to enumerate matching ids
Examples
Input: ["ADD 10.0.0.0/25 allow", "ADD 10.0.0.128/25 allow", "COVERED_ALLOW 10.0.0.0/24", "OVERLAP 10.0.1.0/24", "ADD 10.0.0.64/26 deny", "CONFLICTS 10.0.0.0/24", "COVERED_DENY 10.0.0.64/26", "REMOVE 3", "CONFLICTS 10.0.0.0/24"]
Expected Output: ["YES", "NO", "[1, 2, 3]", "YES", "[]"]
Explanation: Two adjacent /25 allow rules together cover the full /24. After adding a deny block inside that range, the overlapping rules on the /24 contain both actions, so they conflict.
Input: ["ADD 0.0.0.0/1 deny", "ADD 128.0.0.0/1 deny", "COVERED_DENY 0.0.0.0/0", "OVERLAP 192.168.1.0/24", "CONFLICTS 192.168.1.0/24"]
Expected Output: ["YES", "YES", "[]"]
Explanation: The two /1 deny rules partition the whole IPv4 space, so together they cover /0. There is overlap with the query block, but no allow rule, so no conflict.
Input: ["OVERLAP 1.2.3.0/24", "COVERED_ALLOW 1.2.3.0/24", "CONFLICTS 1.2.3.0/24"]
Expected Output: ["NO", "NO", "[]"]
Explanation: Edge case: with no rules, there is no overlap, no coverage, and no conflict.
Input: ["ADD 255.255.255.255/32 allow", "COVERED_ALLOW 255.255.255.255/32", "OVERLAP 255.255.255.254/31", "REMOVE 1", "OVERLAP 255.255.255.255/32"]
Expected Output: ["YES", "YES", "NO"]
Explanation: A /32 rule covers exactly one address. The /31 query overlaps it before removal and does not overlap after removal.
Input: ["ADD 192.168.1.0/25 allow", "COVERED_ALLOW 192.168.1.0/24"]
Expected Output: ["NO"]
Explanation: A single /25 covers only half of the /24, so full coverage fails.
Solution
def solution(commands):
def ip_to_int(ip):
a, b, c, d = map(int, ip.split('.'))
return (a << 24) | (b << 16) | (c << 8) | d
cidr_cache = {}
def cidr_to_range(cidr):
if cidr in cidr_cache:
return cidr_cache[cidr]
ip, plen = cidr.split('/')
plen = int(plen)
x = ip_to_int(ip)
if plen == 0:
start = 0
else:
mask = (0xFFFFFFFF << (32 - plen)) & 0xFFFFFFFF
start = x & mask
size = 1 << (32 - plen)
end = start + size - 1
cidr_cache[cidr] = (start, end)
return start, end
parsed = []
coords = {0, 1 << 32}
for cmd in commands:
parts = cmd.split()
op = parts[0]
if op == 'ADD':
cidr = parts[1]
action = parts[2]
s, e = cidr_to_range(cidr)
coords.add(s)
coords.add(e + 1)
parsed.append((op, s, e, action))
elif op == 'REMOVE':
parsed.append((op, int(parts[1])))
else:
cidr = parts[1]
s, e = cidr_to_range(cidr)
coords.add(s)
coords.add(e + 1)
parsed.append((op, s, e))
coords = sorted(coords)
pos = {x: i for i, x in enumerate(coords)}
seg_n = len(coords) - 1
class SegTree:
__slots__ = ('n', 'minv', 'maxv', 'lazy')
def __init__(self, n):
self.n = n
self.minv = [0] * (4 * n)
self.maxv = [0] * (4 * n)
self.lazy = [0] * (4 * n)
def _apply(self, idx, val):
self.minv[idx] += val
self.maxv[idx] += val
self.lazy[idx] += val
def _push(self, idx):
val = self.lazy[idx]
if val:
self._apply(idx * 2, val)
self._apply(idx * 2 + 1, val)
self.lazy[idx] = 0
def add(self, l, r, val, idx=1, tl=0, tr=None):
if tr is None:
tr = self.n - 1
if l > r:
return
if l == tl and r == tr:
self._apply(idx, val)
return
self._push(idx)
tm = (tl + tr) // 2
if r <= tm:
self.add(l, r, val, idx * 2, tl, tm)
elif l > tm:
self.add(l, r, val, idx * 2 + 1, tm + 1, tr)
else:
self.add(l, tm, val, idx * 2, tl, tm)
self.add(tm + 1, r, val, idx * 2 + 1, tm + 1, tr)
self.minv[idx] = min(self.minv[idx * 2], self.minv[idx * 2 + 1])
self.maxv[idx] = max(self.maxv[idx * 2], self.maxv[idx * 2 + 1])
def query(self, l, r, idx=1, tl=0, tr=None):
if tr is None:
tr = self.n - 1
if l == tl and r == tr:
return self.minv[idx], self.maxv[idx]
self._push(idx)
tm = (tl + tr) // 2
if r <= tm:
return self.query(l, r, idx * 2, tl, tm)
if l > tm:
return self.query(l, r, idx * 2 + 1, tm + 1, tr)
left = self.query(l, tm, idx * 2, tl, tm)
right = self.query(tm + 1, r, idx * 2 + 1, tm + 1, tr)
return min(left[0], right[0]), max(left[1], right[1])
any_tree = SegTree(seg_n)
allow_tree = SegTree(seg_n)
deny_tree = SegTree(seg_n)
active = {}
next_id = 1
answers = []
for item in parsed:
op = item[0]
if op == 'ADD':
_, s, e, action = item
l = pos[s]
r = pos[e + 1] - 1
active[next_id] = (s, e, l, r, action)
any_tree.add(l, r, 1)
if action == 'allow':
allow_tree.add(l, r, 1)
else:
deny_tree.add(l, r, 1)
next_id += 1
elif op == 'REMOVE':
rid = item[1]
info = active.pop(rid, None)
if info is not None:
_, _, l, r, action = info
any_tree.add(l, r, -1)
if action == 'allow':
allow_tree.add(l, r, -1)
else:
deny_tree.add(l, r, -1)
elif op == 'OVERLAP':
_, s, e = item
l = pos[s]
r = pos[e + 1] - 1
answers.append('YES' if any_tree.query(l, r)[1] > 0 else 'NO')
elif op == 'COVERED_ALLOW':
_, s, e = item
l = pos[s]
r = pos[e + 1] - 1
answers.append('YES' if allow_tree.query(l, r)[0] > 0 else 'NO')
elif op == 'COVERED_DENY':
_, s, e = item
l = pos[s]
r = pos[e + 1] - 1
answers.append('YES' if deny_tree.query(l, r)[0] > 0 else 'NO')
elif op == 'CONFLICTS':
_, s, e = item
l = pos[s]
r = pos[e + 1] - 1
allow_max = allow_tree.query(l, r)[1]
deny_max = deny_tree.query(l, r)[1]
if allow_max == 0 or deny_max == 0:
answers.append('[]')
continue
ids = []
seen_allow = False
seen_deny = False
for rid, (rs, re, _, _, action) in active.items():
if not (re < s or rs > e):
ids.append(rid)
if action == 'allow':
seen_allow = True
else:
seen_deny = True
if seen_allow and seen_deny:
answers.append(str(sorted(ids)))
else:
answers.append('[]')
return answers
Time complexity: Preprocessing: O(C log C), where C is the number of distinct compressed coordinates. ADD, REMOVE, OVERLAP, COVERED_ALLOW, and COVERED_DENY are O(log C). CONFLICTS is O(log C + A), where A is the number of active rules scanned to enumerate overlaps.. Space complexity: O(C + A), where C is the number of compressed coordinates and A is the number of active rules..
Hints
- Turn each CIDR block into an integer interval [start, end]. Then overlap becomes a range problem.
- Because all commands are known before processing starts, coordinate compression plus a range-add segment tree can maintain coverage counts over elementary IP segments.