How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Databricks.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Databricks during technical interviews.

Design IP/CIDR rule matcher | Databricks Coding Question

Design IP/CIDR rule matcher

Company: Databricks

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Design and implement a rule matcher that returns 'accept' or 'deny' for a given IPv4 address based on a set of rules. Each rule can be either an inclusive numeric IP range [start, end] or a CIDR block. Define the data structures to support addRule(rule), removeRule(rule), and query(ip), and explain how you resolve overlapping or conflicting rules (e.g., most-specific-wins, then newest-wins). Target up to 1e5 rules with low-latency queries. Analyze time and space complexity and outline tests to avoid bitwise operator-precedence bugs when parsing and comparing IP addresses. Follow-up: propose and compare data structures for efficient range and prefix checks (e.g., interval tree, segment tree, sorted disjoint intervals with binary search, ordered map, and binary/radix trie), and explain trade-offs for static vs dynamic updates.

Quick Answer: This question evaluates understanding of IPv4 addressing, CIDR and numeric range representations, plus competency in designing efficient data structures and algorithms for low-latency rule matching with dynamic updates.

Part 1: IPv4 Rule Matcher with Add, Remove, and Query

You are given a sequence of firewall operations over IPv4 rules. A rule is either a closed numeric range or a CIDR block, and each rule has an action: 'accept' or 'deny'. Process the operations in order and return the result of every query. Conflict resolution is deterministic: among all active rules that match the queried IP, the most specific rule wins, where specificity means the smaller covered interval length. If two matching rules have the same interval length, the newer active rule wins. If no rule matches, return 'none'. If the same rule is added multiple times, each add creates a separate active instance, and a remove deletes only the most recently added still-active identical instance. Removing a rule that is not active does nothing.

Constraints

0 <= len(operations) <= 100000
All IPv4 addresses are valid dotted-decimal IPv4 strings
CIDR prefix lengths are between 0 and 32 inclusive
Actions are either 'accept' or 'deny'
For range rules, start and end fit in the IPv4 space; treat the rule as inclusive

Examples

Input: [('add_cidr', '10.0.0.0/8', 'accept'), ('add_cidr', '10.1.0.0/16', 'deny'), ('query', '10.1.2.3'), ('query', '10.2.3.4'), ('query', '11.0.0.1')]

Expected Output: ['deny', 'accept', 'none']

Explanation: The /16 rule is more specific than the /8 rule for 10.1.2.3. The address 10.2.3.4 matches only the /8. The address 11.0.0.1 matches nothing.

Input: [('add_cidr', '192.168.1.1/32', 'deny'), ('add_range', '192.168.1.1', '192.168.1.1', 'accept'), ('query', '192.168.1.1')]

Expected Output: ['accept']

Explanation: Both rules cover exactly one IP, so they have equal specificity. The newer rule wins.

Input: [('add_range', '1.1.1.0', '1.1.1.255', 'deny'), ('add_range', '1.1.1.0', '1.1.1.255', 'accept'), ('query', '1.1.1.1'), ('remove_range', '1.1.1.0', '1.1.1.255', 'accept'), ('query', '1.1.1.1'), ('remove_range', '1.1.1.0', '1.1.1.255', 'deny'), ('query', '1.1.1.1')]

Expected Output: ['accept', 'deny', 'none']

Explanation: Two identical ranges are active, so newest wins first. After removing the newer one, the older deny remains. After removing that too, no rule matches.

Input: [('remove_cidr', '0.0.0.0/0', 'deny'), ('query', '8.8.8.8'), ('add_cidr', '0.0.0.0/0', 'deny'), ('add_cidr', '255.255.255.255/32', 'accept'), ('query', '255.255.255.255'), ('query', '0.0.0.0')]

Expected Output: ['none', 'accept', 'deny']

Explanation: Removing a non-existent rule does nothing. The /32 for 255.255.255.255 is more specific than /0, while 0.0.0.0 matches only the /0 rule.

Hints

Convert every IPv4 address to a 32-bit integer before comparing ranges. Use explicit parentheses when shifting and masking.
Because the full operation list is known in advance, compress only the IPs that are actually queried, then support range updates and point queries on those coordinates.

Part 2: Choose the Best Rule-Index Data Structure for Firewall Workloads

You are given several workload scenarios for a firewall rule engine that must support numeric IP ranges and CIDR prefixes. For each scenario, choose the cheapest indexing strategy under the exact cost model below. This problem turns the follow-up design discussion into a deterministic optimization task. A scenario may be pure-range, pure-prefix, mixed, or even empty. For mixed workloads, you may also choose a hybrid strategy that uses a radix trie for prefixes and the cheapest valid range structure for numeric ranges.

Constraints

1 <= len(scenarios) <= 100000
All count fields are integers between 0 and 10^9
coordinate_count is an integer between 0 and 10^9
Use L(x) = 0 if x <= 1, otherwise ceil(log2(x))
If multiple strategies have the same cost, use priority: sorted_disjoint_intervals < ordered_map < interval_tree < segment_tree < radix_trie < hybrid(...)

Examples

Input: [{'prefix_rules': 0, 'range_rules': 8, 'prefix_updates': 0, 'range_updates': 0, 'queries': 100, 'disjoint_ranges': True, 'coordinate_count': 0}]

Expected Output: ['sorted_disjoint_intervals']

Explanation: This is a pure static disjoint-range workload, which is exactly what sorted disjoint intervals are best at under the given model.

Input: [{'prefix_rules': 10, 'range_rules': 0, 'prefix_updates': 5, 'range_updates': 0, 'queries': 20, 'disjoint_ranges': False, 'coordinate_count': 0}]

Expected Output: ['radix_trie']

Explanation: This is a pure prefix workload, and only the radix trie directly targets prefixes without converting them to ranges.

Input: [{'prefix_rules': 50, 'range_rules': 10, 'prefix_updates': 10, 'range_updates': 5, 'queries': 100, 'disjoint_ranges': True, 'coordinate_count': 50000}]

Expected Output: ['hybrid(ordered_map+radix_trie)']

Explanation: For the mixed workload, the trie handles prefixes and the ordered map is the cheapest valid range structure, so the hybrid beats a full segment tree.

Input: [{'prefix_rules': 5, 'range_rules': 50, 'prefix_updates': 1, 'range_updates': 20, 'queries': 30, 'disjoint_ranges': False, 'coordinate_count': 64}]

Expected Output: ['segment_tree']

Explanation: The coordinate count is small enough that a segment tree over compressed coordinates becomes cheaper than the hybrid option.

Input: [{'prefix_rules': 0, 'range_rules': 0, 'prefix_updates': 0, 'range_updates': 0, 'queries': 0, 'disjoint_ranges': True, 'coordinate_count': 0}]

Expected Output: ['sorted_disjoint_intervals']

Explanation: All valid strategies have zero cost on an empty workload, so the required tie-break picks sorted_disjoint_intervals.

Hints

First classify each scenario as pure range, pure prefix, mixed, or empty. Different strategy sets are valid in different categories.
Avoid floating-point logs. In Python, ceil(log2(x)) for x > 1 can be computed as (x - 1).bit_length().

Quick Overview

This question evaluates understanding of IPv4 addressing, CIDR and numeric range representations, plus competency in designing efficient data structures and algorithms for low-latency rule matching with dynamic updates.